Friday, June 19, 2009

Shell Programming and PATH

As most readers of this blog will already know, the PATH environment variable is used to locate commands that are executed. Key things to remember as you read this post are:

  • Environment variables (including PATH) are inherited by child processes
  • Child processes are unaffected by the parent process subsequently changing PATH to something else

So what's the big deal? Suppose you have a shell script that calls ps -fe. It works great for you because you have /usr/bin first in your PATH. However, the guy down the hall that cut his teeth on a BSD system has /usr/ucb first. If your shell script does not set PATH=/usr/bin:... prior to calling ps, your shell script will work for you but give strange errors for the guy down the hall. Of course, your shell script could just specify /usr/bin/ps -fe...

This brings up four different styles that are seen...

Style 1: Just hope for the best

#! /usr/bin/ksh

count=$(ps -fe | wc -l)
echo "There are $count processes running"

Style 2: Specify full path whenever calling a program

#! /usr/bin/ksh

count=$(/usr/bin/ps -fe | wc -l)
/usr/bin/echo "There are $count processes running"

Style 3: Create variables to store full path to all programs

#! /usr/bin/ksh

PS=/usr/bin/ps
WC=/usr/bin/wc
ECHO=/usr/bin/echo

count=$($PS -fe | $WC -l)
$ECHO "There are $count processes running"

Style 4: Set PATH to use the commands you want to use

#! /usr/bin/ksh

export PATH=/usr/bin
count=$(ps -fe | wc -l)
echo "There are $count processes running"

With Style 1, the script is only reliable for the subset of users that have the right version of ps first in their PATH.

A workaround for this is shown in Style 2. However, this example has an intentional problem that is somewhat common when this approach is used. Notice that wc is not specified by its full path. This will work fine until someone with a really messed up (or unset) PATH tries to execute the script.

Style 3 fixes the ps and wc problems, but introduces another small problem: it forces a fork() and exec*() to run something that could be more efficiently done via a built-in. I'll talk more about this in a future post.

Style 4 keeps the simplicity of Style 1, but ensures that each user will get the same version of the commands. The author of the script can tailor PATH to contain the minimum set to find the required commands and test the script to gain a high degree of confidence that the script will work for others.

I have a strong preference for Style 4. Performing shell programming retains the feel of using a shell interactively, keeps the code understandable, and performs reliably. But this doesn't mean that it is always the right thing to do. Consider the batch command. It doesn't set PATH and it is very correct in not doing so. That is, if the following:

exec /usr/bin/at -qb $*
were replaced with
PATH=/usr/bin; export PATH
...
at -qb $*
This would change the environment that at(1) attaches to the job - potentially breaking it.

1 comment:

Unknown said...

Or:

#!/usr/bin/ksh

export PATH=$(/usr/bin/getconf PATH)


...will get you a PATH that includes all the POSIX/xpg4 compliant bins which will help in portability (assuming that is desirable).

--Brett