next up previous index
Next: Submitting, Inspecting, and Cancelling Up: Working with LoadLeveler Previous: Introduction

LoadLeveler Configuration

Before you can begin working with LoadLeveler on the SP you must first find how it is configured. And before you can understand how LoadLeveler has been configured on the SP, you must find how the SP itself is configured. This you can accomplish by typing:

gustav@sp20:../gustav 17:35:47 !509 $ jm_status -P
Pool 0:    Batch_only_SP_nodes
  Subpool: BATCH
    Node:  sp01.ucs.indiana.edu
    Node:  sp02.ucs.indiana.edu
    Node:  sp03.ucs.indiana.edu
    ...
    Node:  sp45.ucs.indiana.edu
    Node:  sp46.ucs.indiana.edu
    Node:  sp47.ucs.indiana.edu
gustav@sp20:../gustav 17:35:51 !510 $
This command interrogates the SP Job Manager, or Resource Manager, as it is also called. The -P option lists pools of processors configured into the system. In our case there is just one pool which comprises 47 P2SC nodes. Since every one of those delivers some 700  MFLOPS peak, you've got nearly 33  GFLOPS of computing power available.

Now, once you know what's out there, you can ask LoadLeveler how those resources can be accessed. The command that will tell you that is

gustav@sp20:../gustav 17:36:28 !511 $ llclass
Name           MaxJobCPU     MaxProcCPU  Free   Max Description
              d+hh:mm:ss     d+hh:mm:ss Slots Slots 

b                     -1             -1    15    24 long serial jobs
l                     -1             -1     5     5 large-memory serial jobs
qcd                   -1             -1     1     1 Quantum Chemistry Division
test          0+00:05:00     0+00:05:00     8     8 5-minute test jobs
q             0+01:00:00     0+01:00:00     2     2 quick serial jobs
a             1+00:00:00     1+00:00:00     4     6 short serial jobs
stat          1+12:00:00     1+12:00:00     3     3 statistics jobs
pa            1+12:00:00     1+12:00:00    12    12 short parallel jobs
math          1+12:00:00     1+12:00:00     3     3 mathematics jobs
pb                    -1             -1     0    32 long parallel jobs
gustav@sp20:../gustav 17:54:17 !512 $
This time LoadLeveler tells us that we have 10 classes. LoadLeveler classes correspond closely to queues in systems such as NQS and, indeed, there is a queue associated with every class.

Class pb has up to 32 slots, of which, according to the listing, none are available at present. Those 32 slots are 32 job instances. That is the class allows you to run either up to 32 serial jobs, or, say, 2 parallel jobs, each running on 16 processors.

Classes b, l, qcd, and pb are CPU-time unlimited. This means that you can submit, for example, a 32-way parallel job to class pb that may run forever. This may be rather antisocial, but LoadLeveler configuration allows you to do just that.

Class test is for test runs only, i.e., for very short jobs, just long enough to check that your program has been correctly linked and that it runs. Then we have classes q through math, which are for jobs that take between 1 day and 1.5 days of CPU time.

In order to find more information about any particular class, you can call llclass with the -l switch, e.g.:

 gustav@sp20:../gustav 17:54:17 !512 $ llclass -l pa
=============== Class pa ==========
                Name: pa
            priority: 40
               admin: 
           NQS_class: F
          NQS_submit: 
           NQS_query: 
      max_processors: 8
             maxjobs: -1
       class_comment: short parallel jobs
    wall_clock_limit: -1, -1
       job_cpu_limit:   1+12:00:00, -1
           cpu_limit:   1+12:00:00, -1
          data_limit: -1, -1
          core_limit: -1, -1
          file_limit: -1, -1
         stack_limit: -1, -1
           rss_limit: -1, -1
                nice: 0
                free: 12
             maximum: 12
gustav@sp20:../gustav 18:19:58 !513 $
Here you can see that even though there are 12 slots in this class, a maximum number of processors you can request is 8. The CPU limit is cumulative, i.e., if you run a job on 8 CPUs and if they all munch CPU time equally, the CPU time allowance per processor will be 4 hours and 30 minutes.

If you run llclass -l on the test class, you'll see that it has a higher priority than the pa class. They both run on the same processors, actually, so if there are two jobs submitted at the same time, one to pa and the other one to test, it is the test jobs that will run first - unless users alter the priorities of those jobs explicitely. A user can do that, but user priority has a smaller weight usually than a system priority.

How to find out which class runs on which nodes? To do that you can run the command llstatus:

gustav@sp20:../SP 18:32:22 !544 $ llstatus
Name                      Schedd  InQ Act Startd Run LdAvg Idle Arch      OpSys    
libra.ucs.indiana.edu     Avail     2   2 Idle     0 0.09  2112 R6000     AIX43    
sp01.ucs.indiana.edu      Avail    36   8 Run      1 1.10     1 R6000     AIX43    
sp02.ucs.indiana.edu      Avail     5   4 Run      1 1.02  7801 R6000     AIX43    
sp03.ucs.indiana.edu      Avail     0   0 Run      1 1.08  2952 R6000     AIX43    
...
sp44.ucs.indiana.edu      Avail     0   0 Run      1 1.00  9112 R6000     AIX43    
sp45.ucs.indiana.edu      Avail     0   0 Run      1 1.00  9999 R6000     AIX43    
sp46.ucs.indiana.edu      Avail     0   0 Busy     2 2.03  9999 R6000     AIX43    
sp47.ucs.indiana.edu      Avail     0   0 Run      1 1.04  9999 R6000     AIX43    

R6000/AIX43           48 machines  44 jobs  43 running
Total Machines        48 machines  44 jobs  43 running

The Central Manager is defined on sp01.ucs.indiana.edu

All machines on the machine_list are present
gustav@sp20:../SP 18:32:24 !545 $
When called without any options, llstatus simply lists all machines under the LoadLeveler management. Observe that although our SP has 47 nodes, LoadLeveler manages 48 machines. The 48th machine is libra.ucs.indiana.edu. The listing tells you if a machine is busy or idle, what is the average load on the machine, what is its architecture, operating system, and whether the LoadLeveler scheduler runs on that node.

When invoked with the -l option, the command llstatus returns a very detailed listing for each machine that is managed by LoadLeveler. If you don't want to look at all nodes, you can just select one providing its name on the command line:

gustav@sp20:../SP 18:38:13 !551 $ llstatus -l sp20 
name: "sp20.ucs.indiana.edu"
machine_context:
Running             = 0
ScheddAvail         = 1
StartdAvail         = 1
State               = Idle
ScheddState         = 0
OpSys               = AIX43
Arch                = R6000
Machine             = sp20.ucs.indiana.edu
START               = T
SUSPEND             = F
CONTINUE            = T
VACATE              = F
KILL                = F
SYSPRIO             = ((ClassSysprio *  100) -  QDate)
MACHPRIO            = (0 -  (1000 *  (LoadAvg /  Speed)))
VirtualMemory       = 105392
EnteredCurrentState = Tue Jan  5 12:17:37 1999
Disk                = 13072
Tmp                 = 197736
KeyboardIdle        = 42
LoadAvg             = 0.000092
AvailableClasses    = { "pa" "test" }
DrainingClasses     = { }
DrainedClasses      = { }
Pool                = 0
Adapter             = { "ethernet" "hps_user" "hps_ip" }
ConfiguredClasses   = { "pa" "test" }
Feature             = { "256MB" "afs" }
ProtocolVersion     = 1
CkptVersion         = 1
Memory              = 256
Max_Starters        = 2
ConfigTimeStamp     = Tue Jan  5 12:16:32 1999
Cpus                = 1
Speed               = 3.000000
MasterMachPriority  = 0.000000
Subnet              = 129.79.7
CustomMetric        = 1
ScheddRunning       = 0
Pending             = 0
Starting            = 0
Idle                = 0
Unexpanded          = 0
Held                = 0
Removed             = 0
RemovePending       = 0
Completed           = 2
DependantNotRun     = 0
TotalJobs           = 0
time_stamp: Tue Jan 12 18:37:42 1999

gustav@sp20:../SP 18:38:19 !552 $
There is quite a lot of information in this listing. In particular you'll see the entry ConfiguredClasses, which in this case is: { "pa" "test" }, and this means that when you submit a job to pa or to test it may end up running on that node. Or on some other node that has pa or test in its ConfiguredClasses slot.

It would be good, however, if we could ask LoadLeveler about a particular class and then find which nodes it runs on. The command llclass should do that, but it doesn't. So on our system we have our own local command, which is llconfig and that command prints a more palatable summary:

gustav@sp20:../SP 18:54:26 !556 $ llconfig

        LoadLeveler Configuration on the SP     


                                 Total  
   Node       Job Classes        Jobs  Features

  libra         q                  2   512MB
  sp01          l,b                2   512MB  
  sp02          l,b                2   512MB  
  sp03          l,pb               2   512MB  
  sp04          l,pb               2   512MB  
  sp05          stat,pb            2   256MB  gauss glim lisrel prelis rats sas spss tsp
  sp06          stat,pb            2   256MB  gauss glim rats sas spss tsp
  sp07          stat,pb            2   256MB  glim rats sas spss tsp
  sp08          b,pb               2   256MB  
  sp09          math,pb            2   256MB  lindo lingo maple math matlab
  sp10          math,pb            2   256MB  lindo lingo maple matlab
  sp11          math,pb            2   256MB  lindo lingo maple matlab
  sp12          b,pb               2   256MB  
  sp13          b,pb               2   256MB  
  sp14          b,pb               2   256MB  
  sp15          b,pb               2   256MB  naglib
  sp16          b,pb               2   256MB  naglib
  sp17          pa,test            2   256MB  afs 
  sp18          pa,test            2   256MB  afs
  sp19          pa,test            2   256MB  afs
  sp20          pa,test            2   256MB  afs
  sp21          pa,test            2   256MB  afs
  sp22          pa,test            2   256MB  afs
  sp23          pa,test            2   256MB  afs
  sp24          pa,test            2   256MB  afs
  sp25          l,qcd              2   512MB        
  sp26          b,pb               2   256MB  bigscr naglib
  sp27          b,pb               2   256MB  bigscr naglib
  sp28          b,pb               2   256MB  bigscr naglib
  sp29          b,pb               2   256MB  bigscr naglib
  sp30          b,pb               2   256MB  bigscr naglib
  sp31          b,pb               2   256MB  bigscr naglib
  sp32          b,pb               2   256MB  bigscr naglib
  sp33          b,pb               2   256MB  bigscr naglib
  sp34          b,pb               2   256MB  bigscr naglib
  sp35          b,pb               2   256MB  bigscr naglib
  sp36          b,pb               2   256MB  bigscr naglib
  sp37          b,pb               2   256MB  bigscr naglib
  sp38          b,pb               2   256MB  bigscr naglib
  sp39          b,pb               2   256MB  bigscr naglib
  sp40          a,pa               2   256MB  afs bigscr
  sp41          a,pa               2   256MB  afs bigscr
  sp42          a,pa               2   256MB  afs bigscr
  sp43          a,pa               2   256MB  afs bigscr
  sp44          a,pb               2   256MB  bigscr
  sp45          a,pb               2   256MB  
  sp46          b,pb               2   256MB  bigscr
  sp47          b,pb               2   256MB  bigscr

  Maximum Processor Limits 

  class pa              8
  class test            8
  class pb             32
  all other classes     1

  Memory 

  42 nodes have 256MB memory.  The 6 nodes with 512MB memory can    
  be selected by feature code, providing the appropriate class is
  also specified.    
gustav@sp20:../SP 18:55:16 !557 $
To search for a more specific information you can always grep, for example:
gustav@sp20:../SP 18:55:16 !557 $ llconfig | grep pa
  sp17          pa,test            2   256MB  afs 
  sp18          pa,test            2   256MB  afs
  sp19          pa,test            2   256MB  afs
  sp20          pa,test            2   256MB  afs
  sp21          pa,test            2   256MB  afs
  sp22          pa,test            2   256MB  afs
  sp23          pa,test            2   256MB  afs
  sp24          pa,test            2   256MB  afs
  sp40          a,pa               2   256MB  afs bigscr
  sp41          a,pa               2   256MB  afs bigscr
  sp42          a,pa               2   256MB  afs bigscr
  sp43          a,pa               2   256MB  afs bigscr
  class pa              8
gustav@sp20:../SP 18:56:19 !558 $
And this clearly tells us that class pa runs on sp17 through sp24 and then on sp40 through sp43. The listing also tells us that all those nodes run AFS. They are often used for Computer Science experiments, and you can expect to find various other goodies installed there soon, e.g., DFS, HPSS, and GPFS.


next up previous index
Next: Submitting, Inspecting, and Cancelling Up: Working with LoadLeveler Previous: Introduction
Zdzislaw Meglicki
2001-02-26