next up previous index
Next: A Four Part Job Up: Jobs Dependent On Other Previous: Jobs Dependent On Other

   
Program xrandfile

Here is the listing of the program:

/*
 * Read an existing file full of random integers and perform
 * some arbitrary operation on it.
 *
 * %Id: xrandfile.c,v 1.2 2003/09/12 21:28:33 gustav Exp %
 *
 * %Log: xrandfile.c,v %
 * Revision 1.2  2003/09/12 21:28:33  gustav
 * Converted mkrandfile to xrandfile: an application that reads the file and
 * counts the number of bytes in it.
 *
 * Revision 1.1  2003/09/12 17:55:12  gustav
 * Initial revision
 *
 *
 */

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<errno.h>
#define BLOCK_SIZE 1048576
#define SYNOPSIS printf ("synopsis: %s -f <file> [ -l <blocks_per_read> ]\n", argv[0])
#define CLOSE if (fclose(fp) != 0) { perror (name); exit (5); }

main(argc, argv)
     int argc;
     char *argv[];
{
  FILE *fp;
  int *junk, size_of_junk, n_of_items, blocks_per_read = 1;
  long long count;

  /* Note: we are working in a 32-bit architecture. In this architecture
     long is the same as int (see limits.h) and LONG_MAX = INT_MAX
     = 2^31 - 1 = 2147483647. We have to use (long long), for which
     LLONG_MAX = 2^63 - 1 = 9223372036854775807. This is going to
     have some very limiting ramifications, when we get to use ROMIO.
  */

  /* variables for reading the command line */

  extern char *optarg;
  char *name = NULL;
  int c;

  /* error handling */

  extern int errno;

  size_of_junk = blocks_per_read * BLOCK_SIZE * sizeof(int);

  while ((c = getopt(argc, argv, "f:l:h")) != EOF)
    switch(c) {
    case 'f': 
      name = optarg;
      (void)printf("reading %s\n", name);
      break;
    case 'l':
      if ((1 != sscanf (optarg, "%d", &blocks_per_read)) ||
	  (blocks_per_read < 1)) {
	SYNOPSIS;
	exit(1);
      }	  
      else {
	size_of_junk = blocks_per_read * BLOCK_SIZE * sizeof(int);
	printf("reading in chunks of size %d bytes\n", size_of_junk);
      }
      break;
    case 'h':
      SYNOPSIS;
      exit(0);
    case '?':
      SYNOPSIS;
      exit(1); 
    }

  if (name == NULL) {
    SYNOPSIS;
    exit(2);
  }

  if (! (junk = (int*) malloc(size_of_junk))) {
    perror ("malloc");
    exit(3);
  }
  else 
    printf("allocated %d bytes to junk\n", size_of_junk);

  if (! (fp = fopen(name, "r"))) {
    perror (name);
    exit(4);
  }

  count = 0LL;
  while (!feof(fp)){
    n_of_items = fread (junk, sizeof(int), blocks_per_read * BLOCK_SIZE, fp);
    if (ferror(fp)){
      perror("fread");
      CLOSE;
      exit (6);
    }
    count = (long long) (count + n_of_items); 
  }

  printf ("read %lld bytes\n", (long long)(count * sizeof(int)));
  free((void *)junk);
  CLOSE;
  exit (0);
}

Let me explain briefly how the program works. We begin by reading the command line first. The options are as before, but this time the option -l has a different meaning. It specifies the size of the reading buffer in units of 4 MB. Because the size of the reading buffer is no longer predefined, we have to allocate it dynamically.

Observe that when I process the option -l, I use the number of parameters matched by function  sscanf in order to detect a possible error. The line of the code that prints the synopsis of the program is now represented by a macro  SYNOPSIS, which is defined in the preamble of the program.

After we have finished with reading the command line and checking for possible input errors, we allocate space for the junk array by calling  malloc. If malloc returns NULL, which means that it has failed to allocate requested space, we print the error message and exit. The amount of requested space in bytes is stored on the variable size_of_junk, which is calculated at the beginning of the program and then recalculated again if the -l option is used. This way we set a default value for it, which is 4 MB.

Having allocated the space, we open the file for reading this time.

Then we initialize the variable count, which will be used to count number of integers found on the file, to 0LL. This notation means  ``zero in the long-long format''. We have to use ``long-long'' for count, in order to count number of integers that may be  above INT_MAX, which is only 231 - 1 = 2147483647, i.e., about 2 billion. Our GPFS files may contain more than 2 billion integers.

The reading of the file is done  within the while loop, which checks for the end of file at the top of the loop. We read chunks of data of size size_of_junk with  fread, which returns number of items, in this case they are 32-bit integers, read in a single operation. The number of items read is then added to count.

But before we increment count we check if the read has not returned an error. This is not done by investigating the value returned by  fread, but by calling a special function ferror. This function operates on the file pointer, fp, as does the function that detects the end-of-file  condition, feof. If there has been an error, we print the error message, then attempt to close the file (this is what the CLOSE macro does), and exit raising error flag 6.

Assuming that all has gone well, we exit the while loop cleanly and print number of bytes read from the file on standard output. Then we free the allocated storage, close the file, and exit.

The Makefile for this program looks exactly the same as the Makefile for mkrandfile, with the exception that the name mkrandfile is replaced with xrandfile.

The manual entry for the program looks as follows:

XRANDFILE(1)        I590 Programmer's Manual       XRANDFILE(1)

NAME
       xrandfile  -  read  a file in chunks of various sizes and
       count number of bytes in it

SYNOPSIS
       xrandfile -f filename [ -l blocks-per-read ] [-h]

DESCRIPTION
       xrandfile reads a file in  chunks  of  various  sizes  and
       counts  number  of bytes in it. Can be used for disk-based
       benchmarks.

OPTIONS
       -f     This parameter must be followed by input  filename,
              e.g. foo.

       -l     This  parameter is optional. It must be followed by
              an integer,  which  specifies  the  number  of  4MB
              blocks  per  read. The default is one 4MB block per
              read.

       -h     Print a brief help message.

DIAGNOSTICS
       Self explanatory and numerous.

EXAMPLES
       $ xrandfile -f test -l 10
       reading test
       reading in chunks of size 41943040 bytes
       allocated 41943040 bytes to junk
       read 419430400 bytes
       $ ls -l test
       -rw-r--r--    1 gustav   ucs      419430400 Sep  12  16:20
       test

BUGS
       Command line processing may be fragile. There is no check-
       ing if the requested buffer size does not exceed  INT_MAX,
       although malloc may catch this.

AUTHOR
       Zdzislaw Meglicki <gustav@indiana.edu>

I590/7462                 SEPTEMBER 2003            XRANDFILE(1)


next up previous index
Next: A Four Part Job Up: Jobs Dependent On Other Previous: Jobs Dependent On Other
Zdzislaw Meglicki
2004-04-29