next up previous index
Next: Submitting, Inspecting and Cancelling Up: Working with PBS Previous: Working with PBS

PBS Configuration

The first thing we are going to look at is the PBS configuration. And here you want to find about the names of various queues that may have been configured on the system. The queues usually are associated with various resources. For example, there may be queues for short jobs and queues for long jobs. There may be queues for sequential jobs and queues for parallel jobs. There may be queues for exclusive jobs, i.e., jobs that want the whole node to themselves, and queues for jobs that don't mind sharing nodes with other jobs. There may be queues for jobs requiring a lot of memory and jobs that don't need that much and queues for various architectures as well, because PBS can be used to manage a heterogeneous cluster.

The command you can use to find about PBS  configuration is qstat. In particular qstat -q will  tell you about queues and their parameters.

Running this command on avidd-b.iu.edu returns:

[gustav@bh1 gustav]$ qstat -q

server: bh1

Queue            Memory CPU Time Walltime Node Run Que Lm  State
---------------- ------ -------- -------- ---- --- --- --  -----
bg                 --      --       --     --   17   3 --   E R
                                               --- ---
                                                17   3
[gustav@bh1 gustav]$
This means that there is only one queue configured on avidd-b at the time I'm writing this tutorial. The queue has no memory limit, no CPU time limit, no wall time limit and no node number limit either.

You can use the same command to find about queues configured on the avidd-i.iu.edu as follows:

[gustav@bh1 gustav]$ ssh ih1 qstat -q

server: ih1

Queue            Memory CPU Time Walltime Node Run Que Lm  State
---------------- ------ -------- -------- ---- --- --- --  -----
bg                 --      --       --     --  182 343 --   E R
                                               --- ---
                                               182 343
[gustav@bh1 gustav]$
Well, there is also only one queue configured there, which has the same name (but it is not the same queue, mind you, because it is configured on a different system). That queue has no memory, cpu time, wall clock time and no node number limits either.

This is not common on HPC systems, unless you have one that's used little or by a small number of users. You will find, if you connect to our SP, that there are great many queues configured there.

You can see what jobs run on the system currently by typing qstat without any options. And so, for example, on the avidd-b, we have:

[gustav@bh1 gustav]$ qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
10177.bh1        calmob           surajago         107:06:4 R bg              
11137.bh1        nokkmmcalmob     surajago                0 Q bg              
11617.bh1        klv_ptanal_lopt  kevidale         31:54:37 R bg              
11628.bh1        klv_ptanal_lopt  kevidale         31:51:01 R bg              
11629.bh1        klv_ptanal_lopt  kevidale         31:43:24 R bg              
11630.bh1        klv_ptanal_dach  kevidale         32:00:26 R bg              
11635.bh1        yeast3           jfan                    0 Q bg              
11970.bh1        c8_scan          steige           17:41:52 R bg              
12020.bh1        klv_ptanal_dach  kevidale         11:21:29 R bg              
12082.bh1        helB1            mkohtani         08:33:18 R bg              
12150.bh1        STDIN            ivdgl            00:00:00 R bg              
12151.bh1        STDIN            ivdgl            00:00:00 R bg              
12152.bh1        STDIN            ivdgl            00:00:00 R bg              
12153.bh1        STDIN            ivdgl            00:00:00 R bg              
12154.bh1        myscript.ll      mperezga                0 Q bg              
12155.bh1        STDIN            ivdgl            00:00:00 R bg              
12182.bh1        STDIN            ivdgl            00:00:00 R bg              
12183.bh1        STDIN            ivdgl            00:00:00 R bg              
12184.bh1        STDIN            ivdgl            00:00:00 R bg              
12185.bh1        STDIN            ivdgl            00:00:00 R bg              
[gustav@bh1 gustav]$
The jobs have various ID numbers listed in the first column. These numbers are important, because you have to use them in order to cancel a job, or to postpone it, or to do something else with it. The second column lists names of the jobs given to them by their originators. The names may not be unique, they are up to the users themselves. The user names are listed in the third column.

You can see that right now we only have seven users running PBS jobs on avidd-b:

[gustav@bh1 gustav]$ qstat | grep -v '^Job id' \
   | grep -v '^---' | awk ' { print $3 } ' | sort -u
ivdgl
jfan
kevidale
mkohtani
mperezga
steige
surajago
[gustav@bh1 gustav]$
and only two users on avidd-i:
[gustav@bh1 gustav]$ ssh ih1 qstat | grep -v '^Job id' | grep -v '^---' \
   | awk ' { print $3 } ' | sort -u 
heap
jfan
[gustav@bh1 gustav]$
But the two avidd-i users have submitted 518 jobs to PBS:
[gustav@bh1 gustav]$ ssh ih1 qstat | grep -v '^Job id' | grep -v '^---' | wc   
    518    3108   40922
[gustav@bh1 gustav]$
whereas the seven users on avidd-b have only 20 jobs under PBS management:
[gustav@bh1 gustav]$ qstat | grep -v '^Job id' | grep -v '^---' | wc
     20     120    1580
[gustav@bh1 gustav]$
The command qstat -q tells us the same thing, without the need for piping and filtering:
[gustav@bh1 gustav]$ qstat -q

server: bh1

Queue            Memory CPU Time Walltime Node Run Que Lm  State
---------------- ------ -------- -------- ---- --- --- --  -----
bg                 --      --       --     --   17   3 --   E R
                                               --- ---
                                                17   3
[gustav@bh1 gustav]$
It tells us that there are 17 jobs running at present and 3 queued ones. The 3 queued jobs are not queued because of lack of resources probably. Rather they may depend on some of the jobs that are currently executing.

There are many more options to qstat, and we are going to learn about some of them as we need them. But qstat and qstat -q will do for the time being.


next up previous index
Next: Submitting, Inspecting and Cancelling Up: Working with PBS Previous: Working with PBS
Zdzislaw Meglicki
2004-04-29