Porting Shell Scripts

Article
12/05/2007

Interix Technical Note

Abstract

This paper describes how to port shell scripts to the Services for UNIX 3.0 and Interix environment. Interix provides both the ksh and tcsh shells..

Porting Shell Scripts

This paper describes how to port UNIX shell scripts to the Services for UNIX 3.0 (SFU) product. SFU 3.0 includes the Interix subsystem and Interix utilities that provides a native UNIX environment on Microsoft® Windows® systems. The Interix utility collection includes both the ksh and the tcsh shells.

With Interix, shell scripts from existing UNIX systems can be easily migrated to Windows. Very few, if any, changes are normally required.

There are several other commonly available UNIX-like environment implementations for Windows that provide shells and utilities toolkits similar to UNIX and Interix but these are based on the Windows subsystem rather than the Interix subsystem. Two very popular ones are:

The MKS Toolkit including its version of a “POSIX-compliant (Korn) shell”.
The Cygnus port of the GNU Toolkit (cygwin), including the bash shell.

Migrating shell scripts from other Windows based toolsets to Interix is possible but may require a little more porting effort and conversions. The conversion largely involves removing the workarounds required by the differences between the Windows subsystem environment and the UNIX/POSIX open systems environment.

Porting Open Systems Shell Scripts

When porting a shell script from a UNIX system (such as SVR4 or BSD) to Interix, there are only two significant differences:

Absolute pathnames to the various drive letter based filesystems available on Windows requires the format /dev/fs/X where X is the driver letter designator.
By default, Interix stores binaries in one of three directories: /bin. /usr/contrib/bin and /usr/local/bin.

Porting Win32 Shell Scripts

If you need to convert shell scripts written for a Win32-based shell it is very likely that the majority of your shell scripts will convert easily. Many of the conversion issues are due to the fact that the Microsoft Win32 environment is not POSIX-conformant. Utilities such as the shell, sed, awk, grep and so on, must make concessions to the underlying Windows subsystem environment. The major concessions involve pathname syntax and case-sensitivity. When converting these scripts to work under Interix, you must "undo" these concessions and return to the traditional UNIX environment offered by Interix.

The Issues

There are several UNIX environment implementations that run on Windows. Most of them are based on the Windows (Win32) subsystem which means they have some fundamental issues to work around because this subsystem is not POSIX/UNIX conforming. Some of these problems include:

File system and pathname differences, including case sensitivity, Win32 file naming limitations and the existence of device specifiers. Even though both Win32 and Interix can use NTFS file systems, only Interix makes use of all of the features of NTFS in terms of complete POSIX.1 file semantics with case sensitivity and more allowable characters in file names.
Device name differences. Windows uses drive letter specifiers and different names for the special files /dev/null and /dev/tty. Under Win32, there are other device names as well, including AUX, COM1, and so on. Those devices are specific to Win32.
Identifying the root directory. On Windows, there is the concept of the “current drive” or current filesystem. Thus the root directory ‘/’ could be one of potentially 26 different filesystems depending on which filesystem the “current drive” points to.
PATH variable values. In UNIX, the colon character is used to separate pathnames in these variables. Because Win32 uses the colon character in conjunction with a driver letter to specify fully qualified pathnames, some Win32 shells use the semicolon as the pathname separator. This notation is used in environment variables such as PATH, CDPATH and MANPATH..
Use of special characters in filenames. Microsoft filesystems, such as NTFS, do not support the use of certain characters in filenames. Characters such as colon (:), asterisk (*), question mark (?) and vertical bar (|) are not allowed. The Interix subsystem works around this restriction and does support these characters in filenames.
Identifying executable files. Some Win32 implementations only recognize files with suffixes as executable files. Even shell scripts use special file suffixes such as .sh or .ksh. And in some cases shell scripts are always executable even though the file’s execute permission is not enabled. On UNIX and Interix, a shell script is only executable if the file’s permission is properly enabled.
Text file format. In Windows, text files are terminated with the carriage return-linefeed <CR-LF> characters. On UNIX, text files are just terminated with the linefeed <LF> character. The Interix commands and utilities expect text files to be in the UNIX file format (lines terminated by a linefeed). These text files can be converted to the UNIX format via the command flip -u. For example, to convert the script qotd.ksh, you can use the command line:

flip -u qotd.ksh
Handling of #! in scripts. Some Win32 shells support the #! specifier on the first line of a script file but require the Windows pathname format (such as C:/bin/awk).

An Overview Of Conversion

The general conversion process is simple. In the Interix environment:

Convert the text file line ending format from DOS based <CR-LF> format to the UNIX based <LF> format using flip -u.
Copy the file to remove the file extension, if you choose.
Make the file executable with chmod +x.

These first three steps can be handled automatically with a simple shell loop. Assuming your Win32 shell scripts use the extension .sh, it could be:

for i in *.sh; do
   flip -u $i
   cp $i $(basename $i .sh)
   chmod +x $i
done

The remaining steps require changes in the shell script itself. These changes are difficult to automate and are usually done by hand.

You may have to convert backslashes (\) to forward slashes (/) in pathnames. This usually isn’t necessary because all of the Win32 shells support the use of forward slashes as pathname separators.

Convert any drive identifiers in pathnames from the X:/ format to /dev/fs/X/ format. In an Interix pathname, the drive letter must be in uppercase. It is also possible to create symbolic links to drives in /dev/fs, so you could create /c as a link to /dev/fs/C. Note: this practice is not recommended.

Convert the values of any PATH environment variables to use colons instead of semicolons.

Convert the use of NUL to /dev/null and convert CON to /dev/tty.

Check the interpreter arguments to any #! line.

If your shell scripts make use of Windows environment variables (see Control Panel->System->Advanced->Environment Variables), or if your shell scripts launch Win32-based programs that use environment variables, and these values contain pathnames then you will need to convert these values:

Use the winpath2unix utility to convert pathname values stored in environment variables from the Windows format to the Interix format.
Before executing any Win32-based programs, use the unixpath2win utility to convert the values of any required environment variables back to Win32 format, or the Win32 program will not understand the value of the variable.

Besides environment variables, command line pathname arguments to Win32 programs may also need converting using unixpath2win.

Your approach depends upon your environment. For many installations, it will be enough to approach it iteratively, converting the scripts a bit at a time until all special cases have been covered. For other installations, it may be worthwhile to write an extensive awk script to heuristically handle most of the cases.

In all but the most straightforward cases, the conversion should be verified manually.

Pathnames

When using Interix, the characteristics of file and path names are different than the Windows subsystem. In particular:

filesystems mounted to specific driver letters are accessed via the notation of /dev/fs/X/ rather than X:/ or x:/. The drive letter must be in uppercase.
shared network filesystems can be accessed using the notation /net/machineName/shareName/filename rather than the UNC notation of //machineName/shareName/filename.
on NTFS filesystems, certain characters are usable in Interix filenames that are not allowed by Win32-based programs. Characters such as colon(:), asterisk (*), question mark(?) and vertical bar (|).
filenames are case-sensitive (e.g. /SFU is different from /Sfu and from /sfu).

Here are some approaches for identifying and dealing with these differences.

Converting Drive Letters

Different shells handle drive letters in different ways. Normally the use of Win32 style drive letter format (for example, C:/) is supported. But in the Cygwin environment, driver letter filesystems can be mounted anywhere. Normally they are automatically mounted to /cygdrive/X (where X is the drive letter).

Depending upon your scripts, you may be able to convert pathnames trivially. For example, if your scripts only use two or three drive designators (say, C:, c:, D: and d:), you may be able to convert all of your scripts with a simple sed script. Suppose your scripts contain these pathnames:

LIBDIR=c:/mscv40/lib
SCRIPTDIR=C:/usr/share/scripts
SRC=D:/src
BUILDBIN=D:/build/bin

You can convert them with this sed script:

s/c:/C:/g
s/d:/D:/g
s;\([C-Z]\):;/dev/fs/\1;g

The output looks like:

LIBDIR=/dev/fs/C/mscv40/lib
SCRIPTDIR=/dev/fs/C/usr/share/scripts
SRC=/dev/fs/D/src
BUILDBIN=/dev/fs/D/build/bin

Before using this sed script, beware of other places that may contain colon characters; these colons may be converted inappropriately.

The parameter expansion operators :=, :-, :? and :+
Option strings to getopts
Inside the text of comments
The colon is also a built-in shell command that always returns 0; it is sometimes used in endless loops or (in older shell scripts) as a comment character.

There may also be pathnames like c:autoexec.bat, where there is no \ character after the : — these pathnames are legitimate under Win32 (they imply the last directory used on the specified drive) but are not legitimate under the POSIX file system.

Handling Win32 Format Restrictions

Win32 tools often use names adapted to the 8.3 format of the FAT file system instead of the traditional name (for example, profile.ksh instead of .profile). The usual strategies involved truncating the name, adding an extension, compressing it by dropping characters, or replacing a dot prefix with another character. Some Win32 systems will use the traditional name even on an NTFS partition. For example, the MKS KornShell represents the .profile file as profile.ksh.

Once you have identified names used by your Win32 shell, you can change them with a sed script such as:

s/profile.ksh/.profile/g

You may want to search for any string containing the executable file extension or the special character (such as the .sh extension) in order to catch items like this example from a Win32 shell script:

case $(uname) in
Windows*)  . $ROOT/etc/prof_nt.sh;;
*)          ;;
esac

You can change these pathnames with a sed command. Assuming your pathnames don't contain spaces, you could use:

sed -e `s/\([^ ]*\)\.sh/\1/g' oldscript > newscript

This would change the example script above to:

case $(uname) in
Windows*)     . $ROOT/etc/prof_nt;;
*)          ;;
esac

Case Sensitivity in Pathnames

The Interix environment is sensitive to upper and lower case letters in file and path names; Win32 is not.

Using a regular expression or an automated tool to extract pathnames so you can check them for case problems is difficult. Pathnames may refer to temporary (non-existent) directories or files; they may be relative filenames; they may contain variables (such as ${TMPDIR:-C:/tmp} ), or they may contain ., .., or ~/. Searching for strings containing the / character is a good first step.

One possible approach to checking for case problems in pathnames used in shell scripts is to extract all pathnames from the set of scripts to be converted. This list could be massaged and sorted to create a file containing one unique pathname on each line. Assuming you store the list of pathnames in a file named pathlist, you can perform a first check with the shell command:

while read filename
do
if ! [ -e $filename ] ; then
     print  “$filename doesn't exist”
fi
done < pathlist

This must be done on a local drive; it doesn't work on a network-mounted drive. This is because network mounted drives are not case sensitive in the Interix environment.

Filename pattern matching (especially in shell case statements) may or may not be case-sensitive depending upon your shell. Only the Cygnus bash shell always matches filename case in case statements and filename patterns.

The MKS shell ignores the case of pathnames in wildcard globbing unless the DUALCASE environment variable is set: normally the pattern f* matches files that begin with f and files that with F.

The PATH Variable

Traditional UNIX (and on Interix), the variables PATH, CDPATH, and MANPATH hold lists of directories separated by colons. On Win32 systems where the colon is used to specify a drive device, a different separator is used, usually a semicolon. To work under Interix, the separator must be changed back to a colon.

This sed command will do most of the work:

/PATH/s/;/:/g

It will catch almost every definition except ones that use an escaped newline as a continuation character, such as this one:

PATH=c:/mks/bin;\
 d:/usr/local;d:/usr/bin

You could write an awk script to catch this, but escaped newlines are rare in shell scripts, so it may not be worth the effort.

Identifying the Installation Root

Because a package can be installed anywhere on a Windows based system, all systems must provide some method for the software to locate the installation hierarchy and any other required files.

MKS uses the environment variable ROOTDIR.

Cygwin and Interix have a single rooted file system, so “/” is always the same no matter what drive you may be on. For Interix, there is also the $INTERIX_ROOT environment variable which exists as a backward compatability feature for users upgrading from Interix 2.2. Because this variable is available it is easy to convert MKS scripts using the following example:

sed -e `s/ROOTDIR/INTERIX_ROOT/g' oldscript > new

Converting NUL and CON

In the Win32 subsystem, the null device is named NUL and the terminal console is named CON. On UNIX systems and on Interix systems, these two devices are named /dev/null and /dev/tty. Each Win32 shell provides access to the null device and the terminal console, but through different names:

System	Null	Console
UNIX systems and Interix	/dev/null	/dev/tty
Cygwin toolkit	NUL or /dev/null	/dev/tty
MKS KornShell	NUL or /dev/null	CON or /dev/tty

If your shell scripts refer to other reserved Win32 device names, such as AUX or COM1, you must change them to an Interix equivalent. Some of these devices like LPT1 or AUX are not available in Interix. However COM1 is /dev/tty01 and COM2 is /dev/tty02.. Items sent to the printer can be handled with the lp command.

Running Win32-based Programs

The Interix subsystem can run Win32-based programs. Many Windows programs look for file and pathname information in the environment or from the command line arguments. Win32-based programs will probably not understand pathnames in the Interix format. If your shell scripts launch Win32-based programs that require Windows formatted file and pathnames, you will have to make special allowances for those programs.

For example, suppose you have a Win32 compiler interface named compile.exe that expects the environment variable LIB to contain the name of the directory that contains library files. In the original Win32 shell script, compile was just invoked:

compile $*

In the Interix version, you must convert the value of the LIB environment variable from a POSIX-style pathname to a Win32-style pathname before you run the compile.exe program. After running the command, you should reset the environment variable if you will then be calling Interix programs which will examine its contents.

To convert pathnames between formats, Interix provides the unixpath2win and the winpath2unix utilities:

export LIB=$(unixpath2win $LIB)
/dev/fs/C/COMPILER/BIN/compile.exe $*
export LIB=$(winpath2unix $LIB)

For more information about Interix, see the following Web sites: https://www.microsoft.com/technet/interopmigration/unix.mspx
https://www.microsoft.com/technet/interopmigration/default.mspx.

Porting Shell Scripts

On This Page

Porting Shell Scripts

Porting Open Systems Shell Scripts

Porting Win32 Shell Scripts

Additional resources

Porting Shell Scripts

On This Page

Porting Shell Scripts

Porting Open Systems Shell Scripts

Porting Win32 Shell Scripts

Related Links

Additional resources