Organizing your jobs into multiple batch files may lead to confusion, especially if files are many and they depend on each other in funny ways. It is better to compile all such tasks on a single file and submit it several times in order to execute various parts of the task. But how can we differentiate between various submissions of the same file? The answer is with environmental variables. To begin with the PBS directive
#PBS -Vwill make all variables defined in the environment from which the job is submitted available to the job. We can then add more variables by using the
-v option followed
by variable specifications.
In the following script I have combined first.sh, second.sh,
third.sh and fourth.sh into a single script. Here is
the listing:
[gustav@ih1 PBS]$ cat all.sh #PBS -S /bin/bash #PBS -q bg #PBS -V LOG=/N/B/gustav/PBS/all_log case $STAGE in 1 ) [ -d /N/gpfs/gustav ] || mkdir /N/gpfs/gustav cd /N/gpfs/gustav rm -f test echo "/N/gpfs/gustav prepared and cleaned." ssh ih1 "cd PBS; /usr/pbs/bin/qsub -v STAGE=2 all.sh" echo "Stage 2 submitted." ;; 2 ) cd /N/gpfs/gustav time mkrandfile -f test -l 1000 ls -l test echo "File /N/gpfs/gustav/test generated." ssh ih1 "cd PBS; /usr/pbs/bin/qsub -v STAGE=3 all.sh" echo "Stage 3 submitted." ;; 3 ) cd /N/gpfs/gustav time xrandfile -f test -l 4 echo "File /N/gpfs/gustav/test processed." ssh ih1 "cd PBS; /usr/pbs/bin/qsub -v STAGE=4 all.sh" echo "Stage 4 submitted." ;; 4 ) cd /N/gpfs/gustav rm -f test echo "Directory /N/gpfs/gustav cleaned." ;; esac >> $LOG 2>&1 exit 0 [gustav@ih1 PBS]$The action taken by the script depends on the value of
STAGE. If $STAGE is 1 then
we prepare /N/gpfs/gustav and then submit
the same script, but this time we set
STAGE to 2 with the -v STAGE=2option.
In STAGE 2 we generate the data file
with mkrandfile and resubmit
the same script with -v STAGE=3.
In STAGE 3 we process the data file with
xrandfile and resubmit the script with
-v STAGE=4.
Finally in STAGE 4 we clean /N/gpfs/gustav
and exit.
Observe that all output is collected on one file
/N/B/gustav/PBS/all_log to which each
instantiation of the script appends
its output. Also observe that we try to capture
standard error on this file too. This is what the
redirection
2>&1means.
Because we take care of collecting all standard output and standard error, the output and error files generated by PBS will be empty.
The job is submitted as follows:
[gustav@ih1 PBS]$ qsub -v STAGE=1 all.sh 14204.ih1.avidd.iu.edu [gustav@ih1 PBS]$We are then going to see
all.sh in the qstat listing
but with a changing ID number. [gustav@ih1 PBS]$ while sleep 10 > do > qstat | grep gustav > done 14205.ih1 all.sh gustav 0 R bg ... 14205.ih1 all.sh gustav 00:00:13 R bg ... 14205.ih1 all.sh gustav 00:00:34 R bg ... 14205.ih1 all.sh gustav 00:00:57 R bg ... 14208.ih1 all.sh gustav 0 R bg ...
Eventually the execution of
all four stages completes and we get the following listing
on all_log:
[gustav@ih1 PBS]$ cat all_log /N/gpfs/gustav prepared and cleaned. 14205.ih1.avidd.iu.edu Stage 2 submitted. writing on test writing 1000 blocks of 1048576 random integers real 5m36.706s user 0m38.510s sys 0m18.780s -rw-r--r-- 1 gustav ucs 4194304000 Sep 13 16:35 test File /N/gpfs/gustav/test generated. 14208.ih1.avidd.iu.edu Stage 3 submitted. reading test reading in chunks of size 16777216 bytes allocated 16777216 bytes to junk read 4194304000 bytes real 0m33.902s user 0m0.010s sys 0m12.210s File /N/gpfs/gustav/test processed. 14209.ih1.avidd.iu.edu Stage 4 submitted. Directory /N/gpfs/gustav cleaned. [gustav@ih1 PBS]$