next up previous index
Next: Exercises Up: Checkpointing and Resubmission Previous: Checkpointing and Resubmission

Timing a Job

Probably most C-language programmers know how to time their jobs, because functions time and clock are parts of the standard C library, which is defined by ANSI C specifications.

Function  time takes a pointer to time_t as an argument and returns a value of time_t on exit.

On the AVIDD system time_t is defined in a somewhat convoluted manner. First it is defined as equivalent to __time_t on /usr/include/time.h and then __time_t is defined as long int on /usr/include/bits/types.h. But, as I have already remarked in section 4.3.4, long int is only a 32-bit integer on the AVIDD's IA32 nodes - as specified on /usr/include/limits.h in the #if __WORDSIZE == 64 clause. Consequently, time_t can count up to LONG_MAX, which on the 32-bit architecture is 2147483647.

If the pointer passed to function time is not NULL, the return value is also placed in whatever location the pointer points at. The returned value is the current calendar time, in seconds, since the Epoch, i.e., 00:00:00 GMT, 1st of January 1970: popularly celebrated as the day when UNIX was born. Because the maximum value returned is LONG_MAX, this gives us about

\begin{displaymath}\frac{2147483647\,\mathrm{s}}
{365\,\mathrm{days/year} \time...
...\times 3600\,\mathrm{seconds/hour}} \approx 68\,\mathrm{years}
\end{displaymath}

until the clock on 32-bit systems  is going to wrap around, which is not until 2038.

If you think there won't be any 32-bit systems by that time and that they won't rely on the old fashioned UNIX function time, don't be so sure. We've seen in the year 2000  how much bad code and antiquated computer architectures lingered in corporate computer rooms and I am quite certain that we are going to have similar  problems in 2038.

You would use function time in order to find out about the elapsed wall-clock time. If you know that, say, your queue allows only up to two wall-clock hours (7200 seconds) per job, by checking how much wall-clock time you've used so far, you will know how much time there is still left too.

Function  clock does not take any arguments and returns a value of type clock_t, which on the AVIDD IA32 nodes is defined to be equivalent to __clock_t on /usr/include/time.h and then __clock_t is defined as long int on /usr/include/bits/types.h. So clock_t can also count up to 231-1 = 2147483647 only.

This function returns CPU time that elapsed since the execution of the program commenced. The returned time is not in seconds. It is in clock cycles. There is a constant CLOCKS_PER_SEC defined  on /usr/include/time.h, which tells how many clock cycles there are per second. So, in order to find out how many CPU seconds you have used so far, you have to divide the result obtained by calling clock by CLOCKS_PER_SEC. This is what section 3 of Linux manual will tell you.

But there is a slight complication here. Because of the insanity of the CAE XSH  convention CLOCKS_PER_SEC is forcibly set to 1 million on all XSI-conformant  systems, regardless of how fast their real clock is. So CLOCKS_PER_SEC is useless for us. On the other hand, since all other standard unix facilities use CLOCKS_PER_SEC too, as you will see soon enough, including PBS, we can just as well stick to it. After all we want to use it to protect our job against a possible PBS CPU-time-out.

Function clock wraps around very often. Present day chips are clocked at very high speeds, e.g., 3 GHz, and at these speeds, the output is going to wrap around every 0.7 seconds or so. Luckily, although I am going to show you how to use clock in the example below, PBS queues are seldom configured with respect to CPU time. It is usually the wall-clock time that really matters and not the CPU time, since the whole idea of a batch queue system is to protect the system against  hogging4.1. You can easily imagine a job that is IO-bound and uses very little CPU time. Such a job could hog the resources for a very long wall-clock time, if the queue was CPU-time bound only.

The following example illustrates how to use functions time and clock.

/*
 * Perform 10,000,000 square roots and divisions, sleep for 5 seconds.
 * Used to exercise time and clock functions.
 *
 * %Id: c-time.c,v 1.3 2003/09/18 21:57:08 gustav Exp %
 *
 * %Log: c-time.c,v %
 * Revision 1.3  2003/09/18 21:57:08  gustav
 * *** empty log message ***
 *
 * Revision 1.2  2003/09/18 21:55:03  gustav
 *
 */

#include <time.h>   /* functions time and clock defined here */
#include <stdio.h>  /* function printf defined here */
#include <unistd.h> /* function sleep defined here */
#include <stdlib.h> /* function exit defined here */
#include <math.h>   /* function sqrt defined here */

main()
{
  time_t  t0, t1; /* time_t is defined on <bits/types.h> as long */
  clock_t c0, c1; /* clock_t is defined on <bits/types.h> as long */

  long count;
  double a, b, c;

  printf ("using UNIX function time to measure wallclock time ... \n");
  printf ("using UNIX function clock to measure CPU time ... \n");

  t0 = time(NULL);
  c0 = clock();

  printf ("\tbegin (wall):            %ld seconds\n", (long) t0);
  printf ("\tbegin (CPU):             %ld cycles\n", (long) c0);

  printf ("\t\tsleep for 5 seconds ... \n");
  sleep(5);

  printf ("\t\tperform some computation ... \n");
  for (count = 1l; count < 10000000l; count++) {
    a = sqrt((double)count); /* square root is a very slow operation */
    b = 1.0/a;       /* division is also a rather slow operation */
    c = b - a;       /* subtraction takes very little time */
  }

  t1 = time(NULL);
  c1 = clock();

  printf ("\tend (wall):              %ld seconds\n", (long) t1);
  printf ("\tend (CPU);               %ld cycles\n", (long) c1);
  printf ("\telapsed wall clock time: %ld seconds\n", (long) (t1 - t0));
  printf ("\telapsed CPU time:        %f seconds\n", (float) (c1 - c0)/CLOCKS_PER_SEC);

  exit(0);
}
We compile this program with Makefile copied from our earlier programs. The only thing you need to change is to add
-lm
at the end of the command line that links the program in order to load the mathematics library, which contains the code of function sqrt.
[gustav@bh1 c-time]$ make
co  RCS/Makefile,v Makefile
RCS/Makefile,v  -->  Makefile
revision 1.1
done
co  RCS/c-time.c,v c-time.c
RCS/c-time.c,v  -->  c-time.c
revision 1.3
done
cc -c c-time.c
cc -o c-time c-time.o -lm
[gustav@bh1 c-time]$
And now we can run this program as follows
[gustav@bh1 c-time]$ time ./c-time
using UNIX function time to measure wallclock time ... 
using UNIX function clock to measure CPU time ... 
        begin (wall):            1063922790 seconds
        begin (CPU):             0 cycles
                sleep for 5 seconds ... 
                perform some computation ... 
        end (wall):              1063922796 seconds
        end (CPU);               1080000 cycles
        elapsed wall clock time: 6 seconds
        elapsed CPU time:        1.080000 seconds

real    0m6.204s
user    0m1.080s
sys     0m0.000s
[gustav@bh1 c-time]$
Observe that the CPU time calculated by our program agrees perfectly with the CPU time returned by the Linux command time. Our estimate of the wall clock time is off by $0.2\,\mathrm{s}$, because our quantum of time is one second.

It is possible to measure elapsed time more accurately than down to a second. The UNIX  program time obviously does. There is a Fortran function etime, which returns elapsed time with the resolution to a nanosecond, but this function does not have a simple, portable C-language equivalent. For the purpose of timing a program running for some hours under PBS the one-second resolution is good enough. For the purpose of timing loops inside the program, of course, it isn't, but then you are interested in the CPU time, not in the wall-clock time and you should use function clock, which yields the best resolution possible, i.e., down to a single clock cycle.



 
next up previous index
Next: Exercises Up: Checkpointing and Resubmission Previous: Checkpointing and Resubmission
Zdzislaw Meglicki
2004-04-29