Chapter 4 - Migrating the Build System

Article
05/10/2007

Introduction and Goals

For a migration project, the Developing Phase is the time when the team builds the solution components — code and infrastructure, as well as documentation. Typically, this work consists of modifying existing code in a way that enables it to work within the new environment. When new code is written, it is generally the case that some aspect of the original component remains unchanged — for example, exposed APIs, or specific component behaviors. Both the modification of existing code and development of new code in this context are considered to be migrating activities. For this reason, this guide refers to the MSF Developing Phase as the Migrating Phase.

Although the development work is the focus of this phase and this guidance, all team roles are active in building and testing deliverables. For instance, the User Experience Role will be creating training materials if they are needed. Some development work may continue into the Stabilizing Phase in response to testing.

The phase formally ends with the Scope Complete Milestone. At this major milestone, the team gains formal approval from the sponsor and key stakeholders that all solution elements are built and the solution features and functionality are complete according to the functional specifications agreed upon during the Planning Phase.

The following list summarizes the major tasks that take place during this phase:

Develop the solution components.
Build or migrate system tests to be used during the Stabilizing Phase.
Build the solution incrementally in a series of daily builds.
Test the solution (perform code component, database, security testing, code reviews, and validate system tests).

Developing the Solution Components

For a build system migration, “developing the solution components” means identifying and then porting the individual solution components that make up that build system. Build processes tend to have many different components, including makefiles, applications such as sed and awk, shell scripts, and individual source files that are compiled and executed during the build process. You must ensure important functionality is maintained — that you have the modules, scripts, and tools to do what the build system needs to do. In some cases, you will develop new files or scripts that perform a specific function in the Windows environment. When stabilizing the system, you will make all of the components work better together to accomplish your goals.

When actually performing a UNIX build process migration, you must deal with the technical and compatibility issues that arise when moving from one environment with a particular set of tools to another environment that has a different implementation of those same tools. The following sections provide descriptions of many of the issues that may occur when migrating from UNIX (Solaris, specifically) to one of three different UNIX environment products for Windows.

File System Differences

The Windows and UNIX file systems have a number of differences beyond the direction of the path separator. The set of disallowed characters is different on each, and the maximum size of file and path names is different. Windows allows the specification of additional devices (as in A:, C:, and so on), while UNIX hides device location in a single-rooted file tree.

Compared to the FAT file system, the NTFS file system on Windows shares the most similarities with UNIX in semantics and behavior, and therefore minimizes the amount of file system-specific changes required in a migration. NTFS file systems support many UNIX features, such case-sensitive file and directory names, directory traversal permissions, file ownership divided between a user owner and a group owner, three types of access times on files and directories, hard links, and a mechanism for file access permissions.

Case Sensitivity

On UNIX, file systems are case sensitive. On Windows, file systems can be case-preserving or case sensitive. A case preserving file system will allow a file to be named Makefile or makefile, preserving the case of the name, but the file system will not allow both of them to exist: it will not allow two files with the same name, differing only in case. The behavior depends on the file system and the subsystem used to access the file system. Using utilities that depend on the Interix subsystem, FAT32 and SMB network-mounted file systems are case preserving; local NTFS file systems appear to be case sensitive.

If you are using the Win32 subsystem, then all file systems appear to be case preserving. They do not use case as a criterion when locating files. In other words, either Makefile or makefile can exist in the same directory, but not both, and either file can be accessed using the name with letters in any case, such as MAKEFILE, Makefile, or makefile.

If you decide to use a Win32-based UNIX toolset, such as MKS Toolkit, or if you want your build process to work on a network-mounted file system, then part of your migration process will be to check your build process to ensure all files in the same directory have different names regardless of case, such as in the example of Makefile and makefile.

Path Name Syntax

The file name syntax on Windows varies depending on the UNIX environment you select.

In the Interix UNIX environment, only UNIX file name syntax is supported. The Interix environment provides a file system view with a single root, where root is the directory where Interix was installed. Access to different drives through drive letters is possible through the special directory /dev/fs (for example, /dev/fs/C/dir/file). Network shares are also available through the special directory /net (for example, /net/server/share/dir/file).

Cygwin also provides the illusion of a single-rooted file system by using its own implementation of mount points and a mount table. The MKS environment does not support the concept of a single rooted file system which means that UNIX absolute file names could be an issue when migrating to the MKS environment. One way to reduce this problem is to create symlinks for well-known UNIX directory names and have them point to the actual location in the MKS installation. For example, a symlink to /bin could be created as follows:

ln –s $ROOTDIR/mksnt  /bin

Both the Cygwin and the MKS environments support Windows file name syntax, including drive letters (c:/dir/file) and UNC pathname conventions (//server/share/dir/file). These environments also support the use of both forward slash (/) and backward slash (\) as a separator.

Reserved File Names

Certain file names are reserved in the Win32 subsystem (and, therefore, the reservations apply in the MKS and Cygwin environments). The following reserved device names cannot be used as the name of a file: CON, PRN, AUX, CLOCK$, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. You must avoid using these names as a file name suffix or file name body, so you have to avoid names such as aux.c, file.aux or NUL.txt.

The Interix subsystem has no such reservations or restrictions. If you do decide to use a build environment based on the Interix tools, and you make use of these reserved file names, remember that these names will not be accessible from the Win32 environment or Win32 utilities.

File Name Character Set Restrictions

The character set allowed for file and path names is more restricted on Windows than it is on UNIX. The characters that cannot be used in Windows file and path names on the NTFS file systems are provided in Table 4.1.

Table 4.1 Characters Restricted from Windows File and Path Names

Character	Name
^A .. ^_	ASCII control characters (encoding values 1 through 31)
>	Greater than
<	Less than
*	Asterisk
:	Colon
"	Quotation mark
?	Question mark
\	Backslash
\|	Vertical bar
/	Slash

The Interix subsystem provides some relief from these restrictions. The additional characters Interix allows are provided in Table 4.2.

Table 4.2 Additional Characters Allowed in Interix File and Path Names

Character	Name
^A .. ^_	ASCII control characters (encoding values 1 through 31)
*	Asterisk
:	Colon
?	Question mark
\|	Vertical bar

Because NTFS file systems store file names using the UNICODE encodings, Interix can detect these special characters and map them to special reserved UNICODE encodings. Only the Interix environment supports these mappings. The Cygwin and MKS environments — utilities based on the Win32 subsystem — are not capable of displaying these characters.

The Colon Character

Windows file systems do not allow the colon character in file names except in a reference to a drive letter that represents a mounted file system such as c:/tmp. There are two issues related to the colon character:

When a file name used in the original UNIX build process contains an embedded colon character, such as :rofix.
When a valid Windows file name containing a colon character (such as c:/tmp) is used as a target in a makefile.

When using the Interix environment, the first issue is not a problem because the Interix subsystem?fs file system semantics are nearly identical to UNIX semantics: colon characters can be used for Interix-based make utilities (/bin/make, gmake). The Interix subsystem automatically maps the colon character to a special character that Windows file systems accept. The second issue is not applicable because Interix does not recognize Windows path names (such as c:/tmp).

When using the Cygwin environment, the first issue is a problem because the Windows environment does not allow colon characters in file names. The second issue is not a problem; a target can be an absolute path name with a drive letter specifier.

When using the MKS Toolkit environment, the first issue is a problem because Windows does not allow colon characters in file names. The second issue is also a problem because of the way MKS make parses the makefile looking for targets. If the target file name contains a colon character, the colon character is treated as the separator between the target and the prerequisites. To solve this problem, you have to quote the target name. For example:

"c:/tmp/a.exe" :  "c:/tmp/main.obj"

The Period Character

The Win32 subsystem cannot deal with files that end with a trailing period (.). When using the MKS Toolkit or Cygwin environments, these types of files have to be renamed. If you are using the Interix environment, no change is necessary.

Symlinks

Windows file systems do not support symbolic links in the same way that UNIX systems define symlinks. However, Windows provides "junction points." Junction points only work on local NTFS file systems and only with directories, not files. By using junction points, you can graft a target folder onto another NTFS folder or mount a volume onto an NTFS junction point. Junction points are transparent to programs.

The MKS Toolkit ln command uses NTFS junction points to implement symbolic links. Specifying the -s option can only be used with directories. If this option is used with a file, it will display an error.

The Interix subsystem has implemented its own special mechanism to provide UNIX symlinks on any type of file system (NTFS or FAT, local or networked). The symlink files are special data files that are recognizable as symlinks only in the Interix environment. They are not recognizable to Win32 applications.

The Cygwin ln command also supports the -s option to create symlinks. It uses the Window's "shortcut" mechanism where the source file is a special data file with a .lnk suffix. These files are treated like symbolic links only by Win32 applications that recognize this special .lnk suffix, including applications such as Explorer. These symbolic links can be created on any type of file system.

Hard Links

A hard link is just another term for a directory entry, which is the way names are associated with a file. The NTFS file system supports having multiple directory entries (or multiple names) for the same file. The UNIX utility used to create these links is known as ln, and all three UNIX environment portability products support this functionality.

Permissions and Security

The permissions and security models are different on Windows and UNIX. The Windows and UNIX operating systems use inherently different mechanisms for user identification and resource access control. While in UNIX there are fairly simple user and group IDs represented by a 32 bit numeric value, Windows uses user and group security identifiers (SIDs) that are variable length numeric values and are normally quite large — a typical SID in string form might read "S-1-5-21-1431262831-1455604309-1834353910-1000". For file protection, UNIX uses nine file permission bits, with three bits each for owner, group, and other permissions, whereas Windows uses Discretionary Access Control lists (DACLs) that support more than 14 different access rights, each assignable to an individual user or group.

Each UNIX environment on Windows maps the UNIX paradigm onto Windows differently. The MKS utilities provide the most simplistic support, with very little support for accurate user and group IDs and very little mapping between file permissions and DACLs. The Interix environment provides the most faithful and accurate mapping, with unique IDs for both local and global domain accounts and an accurate translation mechanism between UNIX file permission bits and DACLs. The Cygwin environment implements mappings similar to how Interix does it, but there are a few features that Cygwin does not handle as well.

There should be very few security-related issues in a build process. There may be a few occurrences of the chmod(1) or install(1) utilities — you must examine these and see if the security needs of the process are being met. For example, if an application is required to run in a special security context (such as a setuid application) and it is being installed using special permission bits, you need to determine if this type of installation is still required on Windows. Another example is when the build process itself automatically creates shell scripts that it must execute. To ensure the script is executable, the build process may invoke the chmod(1) utility. In this case, a simple

chmod +x scriptfile

may be necessary in your script or makefile. All the UNIX environments on Windows support this command.

A common utility found in build systems is the install utility. This is used to move or copy a file or executable to a common installation location and set the file's permissions and its owner and group identity (security and permissions problems are common when using install). All the UNIX environments on Windows support this command. It is part of the standard distribution for Interix and Cygwin. The MKS version can be found in the samples/ directory on the distribution media.

Remote File Systems

The standard UNIX networked file system is the Network File System (NFS). The standard Windows networked file system is the Common Internet File System (CIFS), which is based on Server Message Block (SMB).

There are Windows NFS clients and UNIX CIFS servers available to provide network file system interoperability. Even if you design your system so that the source files can stay on a UNIX file system and the build can be run on a Windows system, you need to make sure that the file names are compatible with Windows, as translated by either the NFS client or CIFS. The NFS and CIFS software perform name transformations on illegal names. Using NFS clients on the Windows computers, or a CIFS server such as Samba on the UNIX computer, means that the code itself does not need to be moved, but using NFS clients or Samba does not lessen any of the other obligations in the migration. For example, a case-sensitive remote file system may be an important requirement, especially if you use files named makefile and Makefile in the same directory.

Another problem to consider when using remote file systems is the need to keep the system clocks synchronized with each other. Because the make utility is so sensitive to time stamps on files, it is very important that the clocks always be synchronized. Otherwise, the make utility may not determine the proper prerequisites and dependencies. Fortunately, there is a time service that ships by default in all Windows systems beginning with Windows 2000. Systems that are part of a domain automatically synchronize with the Domain Controller. Other systems, such as UNIX systems and non-domain Windows systems, will need to have a time server set up properly.

Migrating makefiles

The make utility can be found on all UNIX platforms. There are sometimes several different versions of make on each platform. For instance, Solaris has three: /usr/ccs/bin/make, /usr/xpg4/bin/make, and /usr/lib/svr4.make. The latter one is distributed for users that have makefiles that were ported from a SystemVr4-based system. The first two are based on Sun Microsystems' historical implementation, and they support many of Solaris's specific functionalities. The second one is provided for those users trying to create portable make environments based on the XPG and POSIX-based specifications.

On most UNIX platforms, the GNU implementation of make — known as gmake — is also available. This version of make features:

Conformity with the POSIX standard specification.
Compatibility with other versions of make, such as the BSD 4.3 and the System V versions.
Many of its own unique features.

If your build process already uses gmake, then you should probably continue to use it on Windows. This is convenient because gmake is available in all three UNIX environments. The following sections discuss many of the features in make that vary between historical UNIX implementations and the versions on Windows. This will help you identify potential problem areas in your migration of build environments.

make Startup

When you run make, it usually begins by reading a file containing all the default rules and predefined macro definitions. This is sometimes known as the startup or default makefile. To migrate your makefiles, you may need to add special rules to the startup makefiles on Windows.

For MKS make, the default location of this startup file is $ROOTDIR/etc/startup.mk. This file can be overridden by the MAKESTARTUP environment variable. For instance, when you install the MKS Toolkit for Developers or MKS Toolkit for Enterprise Developers, this environment variable is set to $ROOTDIR/etc/nutc.mk.

For Interix make, the default location is /usr/share/mk/sys.mk. The Interix make also uses the MAKESTARTUP environment variable to provide a means for the user to override the default file.

On Solaris systems, make will always read the first file it finds, searching first for ./make.rules and then for the default rules file located at /usr/share/lib/make/make.rules.

For gmake, the default rules are built into the program. They are not included by reading in a separate file as is done by other versions of make.

Including Other makefiles

In Solaris make, the include directive can be used to specify another file that should be processed as if its contents had been included at this line in the makefile. The word include must appear as the first seven letters of a line and be followed by a space or tab character.

MKS make, Interix make, and gmake also support this syntax.

Multi-line Comments

Historically, a comment started with a number sign (#) and continued until a non-escaped newline character was found. For example:

# this is a multi-line comment \
   line 2 of the comment \
   line 3 ends with a non-escaped newline and ends the comment

This definition is also mandated by the POSIX and UNIX standards. Solaris make, MKS make, and gmake all support this syntax. However, Interix make does not. The alternate way to write multi-line comments is to write multiple single line comments, each starting with a # character.

Macros

Macros are simple variables where the macro name is assigned a string value — a value comprised of a string of characters. Macros can appear anywhere in the makefile. The syntax of a macro definition (or assignment) is:

VAR = string of characters

A macro value can extend across multiple lines by ending a line with a backslash (escaping the newline). It is traditional — but not necessary — to start the continuing line with white space because the white space makes it easier to read.

A macro is normally referenced with the syntax

$(macroname)

. A single-character macro name can be referenced without the parentheses: FakePre-2326eb37873c4372bcd3ed4322ebefb4-8de44d4e7eec4918b0174528b1e13bfd. When macros are referenced, they can be expanded into their values. This expansion depends on the location of the macro in the makefile.

Basic macro usage is consistent between the different implementations of make. However, each implementation may support different features. Several of these features are described in the following sections.

Special Macro Names

There are macro names that are interpreted specially by make. Different versions of make have their own special macros, but these two are common:

SHELL

The SHELL macro is a special macro that does not inherit its value from the environment. This macro can only be set only in the makefile itself. The value of this macro specifies the program to be used when executing recipes commands. Normally, it is set to a full path name (such as /bin/sh). If you use this feature in your UNIX makefile, then you have to ensure this same full path name exists in your UNIX environment on Windows; otherwise, you will have to either change this value or remove this macro.

For MKS make, a value of /bin/sh can work, but only if you create the /bin/sh path name on your system. You could either create this directory and copy all the utilities you need (like sh ) from $ROOTDIR/mksnt or, if this a local NTFS file system, then you could create a symlink to $ROOTDIR/mksnt, such as:

ln –s $ROOTDIR/mksnt /bin

Note that you will have to create this path name on every file system from which you invoke **make** because the location of root (/) is relative to the file system of your current working directory.

An alternate solution would be to change the value of the SHELL macro to $MKSBIN/sh.exe or $ROOTDIR/mksnt/sh.exe.

MAKEFLAGS

MAKEFLAGS contains a list of make command flags, including macro assignments. These are used in make as if they were on the command line and are passed for any recursive invocations of make. Several command flags are not valid in MAKEFLAGS because they do not make sense in this context; see the make(1) man page for more specific information.

This special macro is created by make and contains flags and macro definitions used by the make command. If the MAKEFLAGS environment variable is set, then make interprets the values as if they were command line options. These are interpreted in addition to the options specified on the command line. When make starts up, it creates the MAKEFLAGS macro, which contains all the options and the macro definitions specified on the command line and from the MAKEFLAGS environment variable. This MAKEFLAGS macro value is always exported to the environment so that any nested make commands inherit the options and macro definitions from which the parent make was invoked.

Before the use of MAKEFLAGS, historical UNIX versions of make used a variable named MFLAGS that just maintained option arguments and not macro definitions. Many current versions of make continue to support MFLAGS only for backward-compatibility reasons.

Note that the format of the values stored in MAKEFLAGS is not consistent between the different versions of make. Some versions of make concatenate all the options together with or without a leading hyphen character (for example, -ek or ek), while others maintain each separately with a leading hyphen character (for example, -e –k). Also note that MKS make does not store macro definitions defined on the command line in MAKEFLAGS. It stores only the options. If you perform nested (or recursive) calls to make, then for the MKS environment you will have to modify your makefiles to explicitly include any macro definitions you need passed down to the next level of make. Normally, MAKEFLAGS would have handled this automatically for you.

Macro Assignments

The usual way a macro is assigned a value is with the form

VAR = value

where the value of value is evaluated and expanded when the macro is first used, not when the value is assigned.

Another variation of macro assignment is the syntax:

VAR := value

This assignment operator is supported by most versions of make, but it has a different meaning on Solaris make than on the other versions. In most versions of make, this operator causes the value to be scanned and expanded when it is defined instead of when it is used. On Solaris, this syntax is interpreted as a conditional macro assignment. This type of assignment is described in the "Conditional Macro Definitions" section in this chapter.

Command Replacement in Macro Assignments

Some implementations of make have introduced specialized types of assignments. For example, the make command on Solaris supports command-replacement macro assignments. That is, when the macro is assigned a value, it executes a shell command line and assigns the result to the variable. The syntax for this is

VAR:sh = shell_cmd_line

Interix make supports this syntax, but MKS make and gmake do not. If you choose to use MKS make, you might be able to replace this assignment by capturing the result of the shell_cmd_line before invoking make and pass the value in a command line macro definition, such as:

make  VAR="$(shell_cmd_line)"  <other operands  . . .>

If you are using gmake, then you could use the special shell function available in gmake. In this case, you could rewrite the macro assignment in the makefile as:

VAR := $(shell shell_cmd_line)

Command Replacement in Macro References

When macros are referenced, their values are substituted in place of the macro. The following example demonstrates how a macro can be replaced by the value of a shell command. When the macro

CMD

is referenced, the shell command FakePre-1fc9c15076bc4a2580eefa30542758f8-ac636ba93161441ba55c137a865b90d8is executed and the result assigned to the value of FakePre-ba4829f7aba741d6ada77dec614f3349-11941110ba4c44b8a77f5a24da28dc1f. FakePre-ec1398360d2844e380f2d3c5a10ab1c5-3c896fc161e1447a990ab26790de9cc3

Here, if the file

object_file_list

contains FakePre-b79a561ce13a4952add4239cbfe36a19-86ec4af763314820b8ce9ffae6a609c1, then the target program will be dependent on these object file names and will be built using these same object files.

Solaris make and Interix make all support this feature. The gmake utility and the MKS make utility do not.

Macro Modifiers

Macros are normally expanded when referenced. When macros are expanded, the value or values in the expansion can be modified in various ways. This modifier syntax can vary between versions of make, and it is a potential migration issue.

The syntax for macro modifiers is:

VAR:modifier[:modifier …]

There are two modifier formats that are widely supported. The first is:

old_suffix=new_suffix

All versions of make support this modifier syntax. This is a suffix replacement modifier. Each word that contains the suffix old_suffix will have old_suffix replaced by

new_suffix. For example:

OBJS=${SRCS:.c=.o}

will convert a list of C source file names in the variable SRCS into a list of object file names, which are then assigned to the variable OBJS.

The second modifier format is:

prefix%suffix = str1%str2

This is a pattern matching replacement modifier. The percent (

%)

on the left side of the equals (=) operator matches any character string. So, if a word contains both the prefix and suffix, then % will represent all those characters in between. This string will be substituted into all occurrences of % on the right hand side. For example: FakePre-f428ef73572f4b558adde2abd22595b5-c232493a61204b7d8ef25f14017c7f88

sets the value of

SUBDIRS

to FakePre-7efa6cb579c6452ab082664d8bfdaca7-b6c088d00d32499f861007ab81e913e5.

Any number of % characters can appear on the right-hand side.

Conditional Macro Definitions

A conditional macro definition assigns a value to a variable only while make is processing targets from a specific list. This conditional definition occurs only in the Solaris version of make, and it has the form:

target-list := VAR = value

This definition assigns the value to the indicated variable VAR while make is processing the target named target-list and any of its dependencies. The macro definition takes effect when processing only those targets and their dependencies.

Search Paths

Many large build systems put the source files in one or more different directories. The make utility provides a mechanism that can search one or more directories to find targets, dependencies, or .include files. Most implementations support the VPATH macro. This is a special macro whose value specifies a list of directories that make should search. The format of this directory list is the same as the PATH environment variable — each directory is separated by a colon (:) character.

The only known implementation that does not support this macro is the MKS make. It uses a slightly different syntax. It uses a special target known as .SOURCE instead of a macro. The dependency list associated with .SOURCE is the list of directories that make should search.

If you are porting from UNIX to the MKS Toolkit, and you are using the VPATH macro, you will have to change the makefile to use .SOURCE. For example, change the makefile from:

VPATH = src:headers:../othersrc

to:

.SOURCE: src headers ../othersrc

Default Rules and Default Macro Values

When make starts up, one of the first things it does is set up a list of default rules (such as suffix rules), and it initializes a set of default macro names.

For Solaris, MKS, and Interix, these definitions are located in their default "rules" or "startup" files (see the "Make Startup" section earlier in this chapter). For gmake, these definitions are hard coded into the binary itself.

Many of the conventional macros and suffix rules are the same for most implementations of make. Names such as CFLAGS, LDFLAGS, CC, and LD are common. Solaris make and gmake have defined more macros and rules. Names such as COMPILE.c and LINK.c are available in Solaris and gmake, but are not in Interix or MKS. If you are migrating from Solaris, then you should check to make sure that all the predefined macros you depend on exist in the target implementation you have chosen. If not, then you must add them to your own environment.

File Name Suffixes

On UNIX, file name format is given no special meaning. Any specific formats are just historical conventions, and these conventions are generally not enforced or given special meaning by UNIX applications or the UNIX system itself. For example, object files normally have the .o file extension. But the file extension could just as easily be .obj or .object. The file name syntax is irrelevant to the C compiler or linker — it just checks if the file contains a valid object file format.

On Windows, file name suffixes are given special meanings. Many Windows applications associate special behaviors with files having specific suffixes. As part of your application or construction process migration, you may find that you have to create file names with Windows-specific suffixes in order for them to be recognized properly by other Windows applications. This may affect your makefiles or configuration scripts.

For example, the shells in the MKS Toolkit will run an executable binary file only if the file has a .exe file extension. Otherwise, it will try to interpret it as a shell script, which results in an error. In the build process itself, this usually is not a problem because the executable files are normally created by the cc or gcc command, and these compilers automatically create executables with a .exe suffix. For example, in the MKS Toolkit and Cygwin, the command cc –o hello hello.c (replace cc with gcc in the case of Cygwin) creates a file named hello.exe. Even though you may have explicitly asked for an executable file with no suffix, you get one with the proper suffix.

The more prevalent problem is in shell scripts or in makefiles where the executable file name is explicitly referenced. The file name referenced probably will not match the generated executable file name because of the new .exe suffix. In makefiles, it will be target names that will be affected the most. For example, the rule

hello:
cc –o hello hello.c

will cause the file hello.exe to be created; but hello.exe is not the target name, hello is. Because the target file hello is never created, make will always execute this recipe.

In some cases (such as Cygwin), the gmake utility has been modified so that this special case is handled automatically and the mismatch between the names hello and hello.exe does not occur.

With Interix, there is a different problem with file suffixes. The wcc compiler script does not create executable files with a .exe file extension. This is good because all your build and construction tools will continue to behave properly because all the assumptions and dependencies remain the same with respect to file name syntax. But it may mean that the final Windows application binary may not be in a proper Windows file name format such that it can be executed by other Windows applications (such as Explorer). In this case, you can add a new command to the makefile to rename the file or you can create a post-build script to rename your final binaries into a file name format that suits your needs.

Implicit Rules

Implicit rules are the cornerstone of how make works — they associate a target with a prerequisite by matching specific file names based on general patterns in the file name. There are two common types of implicit rules: suffix rules and pattern matching rules. All implementations of make support suffix rules, and these rules are based on matching the patterns of file name extensions.

Some implementations support more flexible "pattern-matching" rules that allow pattern matching on any part of the file name, not just the suffix. Sometimes this type of inference rule is called a meta-rule.

Suffix Rules

Every make implementation has a set of implicit suffix rules. These rules define how different types of files are created and perhaps transformed from a different type. These types of files are identified by their file name suffix. The make utility is one of the few UNIX utilities that are based on file name suffixes.

The suffix rule mechanism is based on file name extensions that are standard in the UNIX community and that are used to identify different types of files during the source code development and compilation process.

The definitions of all the default suffix rules are normally in the make startup configuration file. You need to check to ensure the make program you use on Windows has the proper implicit rules to create files with no suffixes. For example, the following suffix rule

.c:
$(CC) –o $@ $(CFLAGS) $(LDFLAGS) $<

can be used to create the

hello

target file using the following line: FakePre-c5f376e03dab43d2a0fa4171fbec6f8f-0a344be44f0b439a9becb24db2d780df

However, the MKS make utility is missing this implicit rule. Without this rule, MKS make will not know how to build the file named

hello

from FakePre-7934e8c641e24b25a03f32ed9e852675-4782501e8fdc46a2b5656ce0091c1106. Adding this rule to your makefile is easy to do, but it may not solve all the problems. With the MKS Toolkit, there is still the problem that cc creates a file named hello.exe (not hello), and make does not recognize that these two files are equivalent. So every time you run make, the target file name FakePre-c2f0d38014eb48879343ce57160a071d-8366a5f64c174097be1c7b1cf5d27fde will never be found, and the implicit rule will always be executed.

To help deal with different suffixes between UNIX and Windows the MKS make startup file defines some simple macros, such as $E, $O, $S, and $A, which are assigned to conventional Windows suffixes: .exe, .obj, .s, and .lib, respectively:

E = .exe
S = .s
O = .obj
A = .lib

If you plan to share your makefiles between Windows and UNIX, you could replace the occurrences of these suffixes in the makefile with these macros and then conditionally define these macros with the appropriate values for UNIX in your UNIX make startup file(s). For example:

E =
S = .s
O = .o
A = .a

This technique is not restricted to the MKS make environment; it could be used in any of the make environments.

Pattern Matching Rules

Pattern matching rules can be used to specify a relationship between a target and a dependency based on file name prefixes, suffixes, or both. An example of this type of rule is:

% : RCS/%,v
     co –l $<

Pattern matching rules were introduced into the Solaris make many years ago. Since then, both MKS and GNU have incorporated such rules into their versions of make. The Interix version of make does not support this mechanism.

Library and Archive support

Some versions of UNIX make, such as Solaris and System V-based implementations, support a syntax for detecting dependencies on files stored in archives. Archives are created by the ar(1) utility and contain a collection of files called members. The most frequent use of archives is to store object files for use during compilation.

Historical versions of make support a syntax where archive members can be specified as a target or a prerequisite. Members of an archive are specified with a syntax of

archive(member [member ...]),

such as FakePre-ca588df01d17410293022cf5b0a1e90e-5df453ee18684f96a03bf3318d8357bb).

For example, if you have a makefile that contains

example.a : example.a(member1.o member2.o)

and then you type:

make example.a

make will create the files member1.o and member2.o (using an implicit rule, like

.c.o

, and assuming the source files are available) followed by the creation of the archive FakePre-79b4f02c42534d33bbb4beb1f2793b7c-d04374ab088f4ccfbaefb042b84c70e7 which will be populated with these two .o files.

Support for this syntax is provided by all versions of gmake and by the Interix version of make. This syntax is not supported by MKS make.

Note that with the Interix make, there is a problem with the implicit

.o.a

and FakePre-5d86f7f3ba0e4dee8794cb911cf6ea19-ccf39a22433d40c8b77f829484a5ac88 rules, which are found in the file /usr/share/mk/sys.mk. These implicit rules are used when resolving the archive dependency relationships.

These rules normally look like the following:

.o.a:
$(AR) $(ARFLAGS) $@ $*.o
rm -f $*.o
.c.a:
$(CC) -c $(CFLAGS) $<
$(AR) $(ARFLAGS) $@ $*.o
rm -f $*.o

but they can be fixed if they are changed to look like the following:

.o.a:
$(AR) $(ARFLAGS) ${.ARCHIVE} $*.o
rm -f $*.o
.c.a:
$(CC) -c $(CFLAGS) $<
$(AR) $(ARFLAGS) ${.ARCHIVE} $*.o
rm -f $*.o

Revision Control (RCS, SCCS)

Version control and configuration management are normally handled separately and independently of the build management process. It is much easier to write separate tools than it is to write version control implicit rules for make and get it to work properly. There are some fundamental problems in allowing make to automatically update your source files, especially when you have multiple people working on the same files. It is much safer to put the responsibility on the programmer to get the correct file versions before the build.

Historical versions of UNIX make had special built-in support for Source Code Control System (SCCS), a fairly old revision control system developed on System V. Even though SCCS is still being used today, it is quickly being replaced by more advanced revision control and configuration management tools.

Support for SCCS had to be built into make in a different manner from other rules because SCCS manages files by prefixes and not suffixes. For example, when a C source file named hello.c is put under SCCS control, it creates a file prefixed with s. — for example, s.hello.c. Some built-in support for SCCS is provided in gmake, but not in the Interix and MKS versions of make.

Migrating to Windows from a UNIX system where you have been using SCCS is going to pose problems because the SCCS tools are not provided by the MKS Toolkit, Interix, or Cygwin products. SCCS is not particularly popular, but it may exist on your UNIX system. If SCCS is a requirement, then there are some potential solutions. There are several alternate SCCS implementations with freely available source code. You will have to get this code and port it to Windows yourself.

Another popular revision control system is called RCS. This system uses suffixes on files instead of prefixes (as SCCS does). This makes writing implicit rules for make much easier. RCS is available as a part of all the UNIX environment products: MKS, Interix, and Cygwin.

There are other source code control tools available, such as Perforce and ClearCase. Default make rules provided on UNIX systems do not address these systems. If you already have rules related to these systems, then you should be able to migrate them to Windows in any of the UNIX environments.

Dynamic Macros in Prerequisite Lists

Normally, dynamic macros cannot be used in the prerequisite list; they are confined to the recipe. There is one case where the dynamic macro

$@

can be used in the prerequisite list. In this instance, an additional dollar sign ($) must be prepended to the macro — for example, FakePre-48ffc69d15474088a84a6c28aea9d4b7-e34b6887094f499abb5345fe20c5182a. This example shows how you can reference the current target name in the prerequisite list: FakePre-a9feeb7885c34fca8ee1e27cad226fbe-d951c725d853463fa4180a4d9afdd2f0

is equivalent to

file1 : file1.c
file2 : file2.c
file2 : file3.c

This feature is supported in Solaris, Interix, and MKS make, but not in gmake.

Special Targets

All versions of make support a set of special function targets that are treated specially by make. Some of these targets are well-known and behave the same way across all implementations — for example, the targets .DEFAULT, .IGNORE, .PRECIOUS, and .SUFFIXES.

But each implementation also supports special targets that are unique to that implementation. For example, the Solaris version of make defines several unique special targets, such as .INIT, .KEEP_STATE, .KEEP_STATE_FILE and .MAKE_VERSION. Other versions of make have their own list, such as DELETE_ON_ERROR and .PHONY, in gmake or .OPTIONAL, .BEGIN, and .END in BSD versions of make.

Table 4.3 shows some of the non-conventional special targets.

Table 4.3 Non-conventional Special Targets

Solaris	Interix	Gmake	MKS
.INIT	.BEGIN	—	—
.DONE	.END	—	—
(uses VPATH macro)	.PATH	(uses VPATH macro)	.SOURCE
.KEEP_STATE	—	—	—
.KEEP_STATE_FILE	—	—	—
.SCCS_GET	—	—	—

Translating Compiler Options

The cc command (or gcc) is the programmer?fs interface to the compiler. It may be the compiler program itself, or it may be a wrapper or front-end to the compiler. Each compiler accepts different command line options. To convert the options, you need to know what the existing options are meant to do and what the equivalent options are in your chosen environment, if there are equivalents. The UNIX cc compiler supports many options. Many of these options are common across many of the UNIX platforms. But each platform also has its own set of distinct options. Table 4.5 illustrates some of the more common options used in the invocation of the C compiler on UNIX and the closest equivalents (if any) used by the various C compiler utilities on Windows.

If your build process requires compiler options that are not supported, you will have to find workarounds. You can edit both the MKS cc and Interix wcc utilities directly because they are scripts that provide a front-end to the Microsoft C compiler. Look at the Microsoft C compiler options, determine if there is an option that suits your needs, and then add instructions in the cc or wcc script to provide this functionality.

The locations of the various compilers are as follows:

Table 4.4 Compiler Locations

Compiler	Location
MKS Toolkit for Developers	$ROOTDIR/etc/compiler.ccg
MKS Toolkit for Enterprise Developers	$ROOTDIR/etc/nutccg/cc.ccg
SFU Interix	wcc (from https://www.interopsystems.com)
Cygwin tools	/bin/gcc

The difference between the two MKS Toolkit compiler scripts is that etc/nutccg/cc.ccg contains support for the MKS NuTcracker UNIX portability APIs. The etc/compiler.ccg script is used to build applications using just the Windows-provided libraries and functionality.

Table 4.5 displays many of the common C compiler options provided by the different compilers.

Table 4.5 Common Compiler Options

Description	Solaris	gcc	Interix wcc	MKS cc
Compile but do not link	-c	-c	-c	-c
Do not strip comments during preprocessing	-C	-C	-C	-C
Define a macro using definition variable	-D variable	-D variable	-D variable	-D variable
Preprocess only	-E	-E	-E	-E
Specify include directory incl_dir (uppercase I)	-I incl_dir	-I incl_dir	-I incl_dir	-I incl_dir
Specify a library lib (lowercase L)	-l lib	-l lib	-l lib	-l lib
Search for libraries in lib_dirs	-L lib_dirs	-L lib_dirs	-L lib_dirs	-L lib_dirs
Optimize	-O	-O	-O	-O
Name resultant file outfile	-o outfile	-o outfile	-o outfile	-o outfile
Preprocess only and save output to file with .i extension	-P	<na>	-P	-P
Create stripped executable	-s	-s	-s	-s
Compile to assembler code	-S	-S	-S	-S
Pass arg to linker or compiler or to component c	-W c , arg	-W c , arg	-W arg <pass to compiler> -Y arg <pass to linker>	-W/ arg
Work in strictly-conforming ANSI mode	-Xc	-ansi	-Xc	-Xc
Work in ANSI plus extensions (default mode)	-Xa	<na>	-Xa	-Xa
Work in ANSI and K&R mode	-Xs	<na>	<na>	-Xs
K&R mode	-Xt	<na>	<na>	-Xt

For MKS cc, there is a significant problem with the -l option: the standard -ll and -ly versions give an error because the option instructs the compiler to include functions from the lex and yacc support libraries. When these options are used in the MKS cc command, you get an error:

Warning: Could not locate -ll; assuming "l.lib"
LINK : fatal error LNK1181: cannot open input file l.lib'

The problem seems to be that the libraries for lex and yacc were installed with the wrong names, as lex.lib and yacc.lib (in $ROOTDIR/lib) instead of as l.lib and y.lib. To fix this problem, copy lex.lib to l.lib and yacc.lib to y.lib.

Linkers

The linker utility is used to merge together object files to create executables or shared libraries. It resolves symbol names and memory references in the object files to form a complete executable. On UNIX, this utility is commonly called ld, and on Windows it is known as link.exe.

Normally, you create an executable or shared library using the compiler. The compiler automatically executes the linker in the appropriate manner. Typically, the compiler supports a command line option where you can specify linker-specific options, and these are passed through when the linker is invoked. However, sometimes you want to invoke the linker directly. If you want to do this, you need to determine why you did this on UNIX, whether it is still necessary when building on Windows and, if so, whether you need to modify the linker command line options. As with compilers, the linkers on different platforms support different command line options.

Table 4.6 Common Linker Options

Description	Solaris	GNU ld	Windows link.exe	MKS ld
Set the initial entry point to entrypoint	-e entrypoint	-e entrypoint	-entry: entrypoint	-e entrypoint
Add directory to library search directories.	-L directory	-L directory	<na>	-L directory
Include library x in link.	-l x	-l x	<must specifiy libx.lib as an operand>	-l x [searches for libx.a then x.lib]
Create an address map in mapfile	-M mapfile	-M <output goes to stdout>	-map:mapfile	-W/map:mapfile
Name resultant file outfile	-o outfile	-o outfile	-out: outfile	-o outfile
Strip debug symbols	-s	-s	-debug:none	-s

Dynamic Versus Static Libraries

When building executable programs, you usually link some libraries with your object files. There are two kinds of libraries: static and dynamic. Static libraries are used to build a stand-alone executable with no dependencies on other libraries. All the executable code is contained within the one file. With dynamic linking, the executable depends upon the presence of shared libraries that it attaches to when it starts to execute. These shared libraries can be used by many different executable programs. This way, the executable file is not as large as the static version.

The normal extension for a shared library on UNIX is .so, and on Windows it is .dll; the versioning and identification properties are different, but the differences may be hidden in the build system.

Migrating Shell Scripts

If your build process uses shell scripts, then you will have to check for potential problems that you may encounter running them on Windows. Fortunately, almost all shell scripts are written for the Bourne shell or for a POSIX.2 shell such as the Korn shell or bash. This means that the scripts themselves are rarely a migration issue. Most of the problems that arise are from the operating system differences, such as file system naming syntax and with command line options supported by the other utilities.

One of the more common problems is the use of UNIX absolute path names such as /bin/cp or /tmp. Some of these, like /tmp, are handled by the UNIX environment automatically, but others are not. The MKS environment is the only one that does not support a single rooted file system where the root directory (/) always refers to a constant directory in the file system. The MKS environment provides multiple root directories, one for each mounted file system (that is, drive letter mount) because that is the way Windows works. One way to avoid changing these path names in the scripts is to create directory symbolic links so that common directories point to their corresponding counterparts in $ROOTDIR, in the same way that /bin and /usr/bin would point to $ROOTDIR/mksnt.

Migrating Other Commands

Many UNIX commands are invoked during a build process, either in the target recipes in the makefiles or in shell scripts that are invoked during the build cycle. Part of the migration process is to ensure these commands continue to work on the Windows platform. The information under the following subheadings outlines various known issues with some of the more common utilities that can be found in a build process.

lex

The lex utility is a tool for creating C programs that can do lexical processing, or parsing, of character input. It is mainly used as an interface to the yacc utility. The lex utility reads in a set of rules with corresponding actions and generates a C program that can be used to execute these rules.

There are several popular implementations of lex: flex from GNU, MKS lex, and the original AT&T lex. Many of the commercial UNIX implementations are based on the AT&T version. The Interix implementation is based on GNU flex. All these are mostly compatible with each other, such that porting a file containing lex rules is fairly easy. Only a few changes might be needed.

For a discussion of the differences between GNU flex, AT&T lex, MKS lex, and the POSIX lex specification, you can consult the following resources:

https://www.gnu.org/software/flex/manual/html_mono/flex.html

https://www.mkssoftware.com/docs/wp/wp_lyuse.asp

yacc

The yacc utility is a parser generator. It creates a C program that can be used to parse input based on a specified set of rules and specifications (called a grammar). The user supplies a set of actions that correspond to these rules. These actions are invoked when the parser detects a match. In this way, the yacc utility is used to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages. The yacc utility is normally used in conjunction with the lex utility.

There are several popular implementations of the yacc utility: versions based on the original AT&T code; the version from the University of Berkeley (the public domain BSD version); and the GNU version, which is known as Bison. The Interix version is based on the BSD implementation, and the MKS version is based on code from SCO, which is a derivative of the original AT&T code.

However, all of these versions have based their implementations on the behavior of the original AT&T UNIX implementation. Therefore, migrating your yacc specification file between versions is usually trivial. The most frequent problems seem to occur in their internal limitations, such as buffer sizes and memory limits. In most cases, the GNU version, Bison, seems to be the most versatile and less prone to these kinds of problems.

awk

The awk utility is a string manipulation and report generating language. It is used to apply complex text transformations based on pattern matching.

Versions of awk are consistent across platforms. Although the very oldest UNIX systems differentiate between “old” awk and “new” awk (nawk), modern UNIX systems use only the new awk. The only implementation that significantly extends awk is the GNU awk in Cygwin, or gawk. However, gawk is available for Windows, so if you rely on gawk on UNIX, you should use gawk on Windows.

One compatibility issue that occasionally occurs is the order of iteration over an array in a

for

statement. The order of iteration can vary between implementations because there is no guarantee of the order that the program will traverse the array. In awk, all arrays are associative, so there is no implicit ordering by index; the FakePre-e8217608d81b4d8aa049596d5fe11932-cf09f7f1061c47a39395303aa39488c1 statement can run through the elements in any order.

The cp utility is the standard file copying tool on UNIX. There are two options, -r and -R, that do recursive copying (including directories and special files). The options may differ in their treatment of UNIX special files; the behavior of -R is specified by POSIX.2, but the behavior of -r is not. Examine your shell scripts and makefiles for instances of cp -r, and determine how the special files are meant to be treated.

chmod

The chmod utility changes the UNIX permission bits on a file. UNIX permissions are based on three sets of bits. Windows permissions are based on access control lists, a more flexible but more complicated mechanism. Refer to the UNIX Application Migration Guide for a detailed discussion of permissions.

Because Windows has a much more complicated access control mechanism, most UNIX environments on Windows do not implement the mapping between UNIX and Windows mechanisms accurately. Only Interix does. Unless you are using Interix, be careful when using any UNIX command that deals with file permission bits because you may not get exactly what you expected.

diff

The diff utility compares two text files and provides a list of differences between them. Usually, these differences are presented in a form that can be placed into a script to turn one text file into the other.

Sometimes build tools or the developers themselves use the diff utility to determine if certain files are identical or are different. Most implementations of diff can successfully compare both text and binary files. However, the MKS Toolkit version cannot compare binary files using diff; this diff utility will generate an error message for each binary file. To work around this problem, use the utility cmp when you know you are comparing binary files.

The ln utility creates new file names and links them to an existing file. It is used instead of copying files. There are two types of links: hard links and symlinks. All the ln utilities in these systems can create hard links when using a Windows NTFS file system.

With Interix, the ln utility can create symlinks to files and directories; however, these symlinks are only recognized as such within the Interix environment. These symlink files will appear as system data files when accessed through the Windows environment.

The MKS ln utility can create a symlink to a directory by using Windows junction points. The MKS ln utility does not support the creation of symlinks to files.

The Cygwin ln utility supports the -s option to create symlinks, and it does this by creating a Windows shortcut or .lnk file. This is not a true symbolic link; rather, it is a Windows shell feature that only works with Windows utilities (such as Explorer) that recognize this feature. To many other Windows utilities (and Interix utilities) the symbolic link appears to be a regular file with binary data in it.

sed

The sed command is a stream editor used to change streams of text. It is often used to transform output in some way to generate a new command file or makefile.

There are two common migration problems with sed. The first problem is that older versions of sed often feature only a subset of the basic regular expression package, and that the implemented subset may vary from system to system. The versions of MKS, Interix, and Solaris sed all claim POSIX.2 compatibility, so their regular expressions should be portable. The Cygwin version of sed is supposed to be POSIX.2 compatible, but there are known incompatibilities described in the sed(1) man page. If you are not using a POSIX.2-compatible version of sed, you need to examine the regular expressions used and determine what changes need to be made, if any. The regular expression characters period (

.)

and asterisk (FakePre-87f1550c328347b9b0f4ada37ff45ff3-d2d5eea912ab4c748216bcd1be6ad6e4 are always supported. Interpretation of metacharacters in bracket expressions (such as FakePre-5b61f11828784a7c9e50ca6ad0dd8b77-053b2ca4c47a48e3a0d13aa981c74e27) can vary between non-POSIX implementations.

The second problem is an output incompatibility. The sed command has an autoprint feature that automatically writes the matched lines. There is also a command (p) to print the results of an operation. Some implementations will output the line twice when autoprint is not disabled and when given the p command; others will print only one line. Both are valid under POSIX, so neither is an error. If your sed scripts make use of either print behavior, they might not be portable. Rewrite them to use the -n option and then explicitly print what you want.

Building or Migrating System Tests

At this stage, you will need to build system tests. These will run using the sample makefile (or makefiles) you created earlier.

It is possible that you have tests that were used to check your UNIX build system; if so, they should be migrated to Windows to form the basis of your test system. However, build system tests are rare, so you will probably have to create them. You are implementing the test specification here.

Start with the verification tests used to determine which UNIX environment to use. Use those as the basis and develop the tests from there. These tests will be run regularly when the build system moves into the Stabilizing Phase.

Integrating the Solution

It is important to remember at this stage that you are building an environment that will be installed on an end user's computer. Factors such as the environment variables may be important. These factors must be captured by the User Experience Role to document the build system for the end users.

Some of the components cannot be tested in isolation: there is little point to providing a stub of the cp command so that files are not actually moved; the effort in replacing the cp command is greater than the effort in putting stub files in for testing. Items such as shell scripts will automatically include and test some other components. Eventually, however, you must put all of the pieces together.

Because of the nature of build systems, developers will probably work on all of the different components at once, cycling through them: first, a set of changes to the makefiles to get file names and path names correct and compiler options correct; then a quick make command to ensure that it works; then some fixes to recipe lines; then work on the shell script that gets called when one of the targets is build; and, finally, then back to the makefile.

Test the Solution

Initial tests of the components are done on the development systems. They are moved to the sample site and tested again. Moving the build components to the testing and sample site is an important step in getting the build system to work. Just as it is occasionally necessary to eliminate all intermediate files and build an application from source alone, it will be necessary to clean the sample site of intermediate files, reinstall the current version of the build system, and then try again.

By the end of this phase, your build migration team should have the following:

A build system that runs, although it may still have problems.
Preliminary documentation that describes installing and using the build system, including file name mappings and use of the UNIX portability environment.

A set of system tests that will be used to benchmark the system's behavior. The measures were defined in the test plan and depend upon the specific needs of your organization.