next up previous index
Next: Submitting Parallel Jobs Up: Submitting a Number of Previous: Submitting a Number of

Using LoadLeveler Steps

If you want to execute a number of very simple tasks, in a sequence of LoadLeveler steps, tasks which do not involve much, if any, shell scripting, you may prefer to use LoadLeveler's own multiple job steps facility. That facility is a little bit tricky, and, in particular, you should not try to mix LoadLeveler steps with your own self-submitting shell scripts, because that may easily lead to confusion. In particular, remember, that if you do not use the LoadLeveler keyword #@executable, and thus, according to LoadLeveler's semantics, the LoadLeveler script itself becomes the executable, when the script is passed to, say, ksh for execution, all LoadLeveler keywords will be stripped, and the whole script will be executed in one go, even if the user has separated portions of the script with multiple #@queue directives.

Consider the following LoadLeveler job description file:

#
# Common definitions for all three steps
#
# @ output = $(job_name).$(step_name).out
# @ error = $(job_name).$(step_name).err
# @ job_type = serial
# @ class = test
# @ notification = always
# @ environment = COPY_ALL
# @ job_name = hello
#
# The first step: compile the program.
#
# @ step_name = compile
# @ executable = /afs/ovpit.indiana.edu/@sys/gnu/bin/gcc
# @ arguments = -o hello hello.c
# @ queue
#
# The second step: run the program if the compilation was successful.
#
# @ step_name = run
# @ dependency = compile == 0 
# @ executable = /afs/ovpit.indiana.edu/@sys/gnu/bin/bash
# @ arguments = -c "exec hello"
# @ queue
#
# The third step: remove the binary if the run was successful.
#
# @ step_name = clean
# @ dependency = run == 0
# @ executable = /afs/ovpit.indiana.edu/@sys/gnu/bin/rm
# @ arguments = -e hello
# @ queue

When this script is submitted to LoadLeveler, three jobs will be placed in the queue. Initially two of those jobs will wait until the first job finishes execution. Then the second job will commence execution and the third will continue waiting. Finally, the third job will run. I should add that the second and the third jobs will run only if their direct ancestor has exited without any problems, leaving the exit status set to 0 behind.

The script is conceptually divided into four chunks.

The first chunk is a preamble with definitions common to all three job steps.

The second chunk describes the first step: it invokes the GNU C compiler and compiles a C program hello.c generating a binary hello, if the compilation has been successful.

The third chunk describes the second step: it will run only if the first step has left exit status 0 behind. That's what the directive

# @ dependency = compile == 0
is about. Observe a small complication. Instead of defining
# @ executable = hello
I have defined
# @ executable = /afs/ovpit.indiana.edu/@sys/gnu/bin/bash
# @ arguments = -c "exec hello"
The reason for this is that when the script is originally submitted to LoadLeveler, the file hello doesn't exist yet. So if I defined here #@executable = hello LoadLeveler would refuse the job and flag an error. All executables specified with the #@executable keyword must exist at the time the LoadLeveler script is submitted. The remedy is to specify my login shell as the executable instead, and then substitute (with exec) the shell with the binary produced in the first step.

The fourth chunk describes the third step: it will run only if the second step has left exit status 0 behind. That's what the directive

# @ dependency = run == 0
achieves. It is your responsibility, as a programmer, to ensure that this is indeed the case when your program exits cleanly.

This step removes the binary generated by the first step. The command rm is invoked with the -e option which will leave a trace on the hello.clean.err file:

rm: Removing hello
Can the same be achieved with shell scripting? Although I have warned you about possible pitfalls when mixing scripting and LoadLeveler steps, it is OK to do so, as long as your script does not attempt to resubmit itself. You might even consider the latter, but in that case you must carefully scrutinise the logic of both the shell script and the overlaying LoadLeveler script. Things may become easily convoluted, but not necessarily incorrect! Also, you should remember that the first occurrence of the keyword #@executable will override the shell script for all consecutive steps. If a shell script is present in the LoadLeveler command file, all steps defined before the first occurrence of the keyword #@executable will see the same script. Consequently, the script itself must be able to recognise which particular step is being executed during its instantiation and differentiate its actions accordingly. That information can be obtained from the environmental variable LOADL_STEP_NAME.

Here is an example of a 3-step LoadLeveler job, equivalent to the one discussed above, in which the actions are specified entirely using a shell script rather than three different #@executables.

# @ shell = /afs/ovpit.indiana.edu/@sys/gnu/bin/bash
# @ output = $(job_name).$(step_name).out
# @ error = $(job_name).$(step_name).err
# @ job_type = serial
# @ class = test
# @ notification = never
# @ environment = COPY_ALL
# @ job_name = hello
#
# @ step_name = compile
# @ queue
#
# @ step_name = run
# @ dependency = compile == 0 
# @ queue
#
# @ step_name = clean
# @ dependency = run == 0
# @ queue
#
echo step: $LOADL_STEP_NAME
case $LOADL_STEP_NAME in
   compile ) 
      gcc -v -o hello hello.c 2>&1 ;;
   run ) 
      hello ;;
   clean ) 
      rm -e hello 2>&1 ;;
esac


next up previous index
Next: Submitting Parallel Jobs Up: Submitting a Number of Previous: Submitting a Number of
Zdzislaw Meglicki
2001-02-26