next up previous index
Next: Exercises Up: Moving Data Between HPSS Previous: Moving Data Between HPSS

   
Program mkrandfile

Here is what the program looks like

/*
 * Create a file containing random pattern of specified length.
 *
 * %Id: mkrandfile.c,v 1.1 2003/09/11 19:32:16 gustav Exp %
 *
 * %Log: mkrandfile.c,v %
 * Revision 1.1  2003/09/11 19:32:16  gustav
 * Initial revision
 *
 */

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<errno.h>
#define ARRAY_LENGTH 1048576

main(argc, argv)
     int argc;
     char *argv[];
{
  FILE *fp;
  int junk[ARRAY_LENGTH], block, i, number_of_blocks = 0;

  /* variables for reading the command line */

  extern char *optarg;
  char *name = NULL;
  int c;

  /* error handling */

  extern int errno;

  while ((c = getopt(argc, argv, "f:l:h")) != EOF)
    switch(c) {
    case 'f': 
      name = optarg;
      (void)printf("writing on %s\n", name);
      break;
    case 'l':
      sscanf (optarg, "%d", &number_of_blocks);
      (void)printf("writing %d blocks of %d random integers\n", 
		   number_of_blocks, ARRAY_LENGTH);
      break;
    case 'h':
      printf ("synopsis: %s -f <file> -l <length>\n", argv[0]);
      exit(0);
    case '?':
      printf ("synopsis: %s -f <file> -l <length>\n", argv[0]);
      exit(1); 
    }

  if (number_of_blocks < 1) {
    printf ("initialize number of blocks with -l option\n");
    printf ("use -h for help\n");
    exit(2);
  }

  if (name == NULL) {
    printf ("initialize file name with -f option\n");
    printf ("use -h for help\n");
    exit(2);
  }
  
  if (! (fp = fopen(name, "w"))) {
    perror (name);
    exit(3);
  }

  srand(28); 
  for (block = 0; block < number_of_blocks; block++){ 
    for (i = 0; i < ARRAY_LENGTH; junk[i++] = rand()); 
    if (fwrite(junk, sizeof(int), ARRAY_LENGTH, fp) != ARRAY_LENGTH) {
      perror (name); 
      exit (4);
    }
  } 
  if (fclose(fp) != 0) {
    perror (name);
    exit (5);
  }

  exit (0);
}

Let me explain what the program does and how. This is also going to brush up on your C and UNIX programming skills.

The program begins by calling a standard UNIX function  getopt, which reads the command line. The string "f:l:h" tells getopt what option switches we expect to see on the command line. And so, we expect to see -f followed by an argument, then -l followed by an argument (the presence of the argument is indicated by ":"), and -h, which doesn't have a following argument.

The options can appear on the command line in any order. This is a nice thing about getopt. And it is not necessary to use all the options either.

The switch statement  that is inside  the while loop processes various  cases. And so, if the option character is "f" (the minus sign in front of the option character is always implied by getopt), then the argument that follows -f, and this argument is always going to be a string, is going to be the name of the file on which we are going to write our randomized integers. Observe that name is just a pointer to a character, and it is initialized to NULL. We don't have to copy the content of the string pointed to by optarg, the latter being a pointer to a character too. Instead we simply request that name should point at the same memory location as optarg. This way we can use the name of the file passed to the program on the command line, by referring to it as name.

If the option character is "l", the option (-l) should be followed by an integer. This integer is going to be interpreted as the number of blocks of one million integers each to be written on the file. But recall that optarg is always  a string. So here we have to convert this string to an integer. This can be done in various ways, most of them quite cumbersome, but there is one easy and relatively safe way to do this. I use function sscanf, which  scans a string the same way that scanf scans an input line. The string in question is pointed to by optarg, and we expect the format to be "%d", which means ``an integer''. The last parameter in the call to sscanf is the address of number_of_blocks. Function sscanf will go there and will write the value of the integer in that location. If sscanf does not find an integer in optarg it won't write anything there, in which case number_of_blocks will retain its original value of zero.

The last option defined by "f:l:h" is -h. This option does not take an argument. If it is encountered on the command line, the program will pring the brief synopsis on standard output and it will exit the program  cleanly, i.e., with the exit status zero, right there and then.

If function getopt encounters any other option on the command line, it is going to return "?" and the last case of the switch statement, will print the synopsis of the program on standard output and will exit the program raising an error flag, i.e., the exit status will be set to one.

Having collected the information from the command line, we are now going to check two things, just in case. The first one is the value of number_of_blocks. This value may be incorrect if, e.g., it has not been specified on the command line, then it would be zero, or if it has been specified incorrectly, e.g., made negative. If any of the two conditions holds, we're going to exit with the exit status set to 2. The second thing we're going to check is if we have the name of the file. If the user has not specified the name of the file on the command line, name will stay set to NULL. If we detect this condition, we're going to exit the program with the exit status set to 2 too.

These two simple checks do not exhaust countless possibilities that the user may get something wrong. For example the user may type a weird file name, which may be either too long or it may contain some control codes, or the number of blocks requested may be too large. Some of the problems may be captured by checks in the remainder of the program, but some may remain undetected until it is too late and something horrible happens. This is how security bugs  are born. The command line  reading procedure in our program is somewhat fragile, but it will do for now.

Having obtained all the information from the command line and having checked that it is sensible, we are now going to open the file for writing. The statement  that does it is

  if (! (fp = fopen(name, "w"))) {
    perror (name);
    exit(3);
  }
Observe that we don't just open  the file. We attempt to open it and then immediately check if we've been successful or not. If the attempt is unsuccessful, function fopen is going to return NIL, which becomes false when cast on BOOLEAN. The negation operator, "!", will make it into true and the clause following the if statement will be executed. Inside this clause  function perror is going to print the name of the file, pointed to by name, followed by a colon and by a brief explanation of the error condition encountered. Then we abort having set the exit status to 3. It is here that we can trap an incorrectly entered file name. The failure to open the file may be due to many reasons though. Amongst them may be, e.g., that there is already a write protected file of this name, or that the user has no write permission on the directory in which the program executes.

If fopen returns anything but NIL, this is interpreted as true, the negation operator "!" makes it into false, and then we go the remainder of the program.

The remainder is short and sweet. We seed  the random number generator with 28 (nothing special about this number) and then write the random numbers generated by function  rand on the file. Observe that we don't write a character at the time. Instead we collect about a million of random integers  on an array called junk, and only after the array is full, we push its content onto the file in a single long write. This is the most efficient way of writing data: large blocks, not little drops. But then recall that the blocks in our program are not so large as to overwrite output buffers. A million of integers fill 4 MBs only, which is nothing.

Function  fwrite is called in a way reminiscent of fopen. This function returns the number of items of size sizeof(int) it  has managed to write. This number should be equal to ARRAY_LENGTH if the write has been successful. Otherwise, we have a write error. For example, we may have run out of disk space, or out of quota, and then the number is going to be less than ARRAY_LENGTH. If we detect that fwrite has not written as many integers as it should have, we call function perror, which should tell us what happened, and then abort having set the exit status to 4.

The last operation the program does is closing the file. Function fclose should  return zero if there are no problems, otherwise it is going to return some error code. We don't analyze the error code, relying on function perror instead. If we have encountered an error condition at this stage, we abort the program having set the exit status to 5.

Otherwise, i.e., if everything has gone just fine, we exit the program cleanly, and set the exit  status to zero.

We are going to make and install this program by calling UNIX  make. But before we can do this we have to edit Makefile  first. And here it is:

#
# %Id: Makefile,v 1.1 2003/09/11 19:31:57 gustav Exp %
#
# %Log: Makefile,v %
# Revision 1.1  2003/09/11 19:31:57  gustav
# Initial revision
#
DESTDIR = /N/B/gustav/bin
MANDIR  = /N/B/gustav/man/man1
CC = cc
TARGET = mkrandfile

all: $(TARGET)

$(TARGET): $(TARGET).o
	$(CC) -o $@ $(TARGET).o

$(TARGET).o: $(TARGET).c
	$(CC) -c $(TARGET).c

install: all $(TARGET).1
	[ -d $(DESTDIR) ] || mkdirhier $(DESTDIR)
	install $(TARGET) $(DESTDIR)
	[ -d $(MANDIR) ] || mkdirhier $(MANDIR)
	install $(TARGET).1 $(MANDIR)

clean:
	rm -f *.o $(TARGET)

clobber: clean
	rcsclean

Our target all is defined as being $(TARGET), which here evaluates to mkrandfile. Observe that this Makefile is quite general and you can reuse it for simple programs with other names too.

To make it we have to have $(TARGET).o, i.e., mkrandfile.o first. Once we have it, we link it by calling

$(CC) -o $@ $(TARGET).o
which evaluates to
cc -o mkrandfile mkrandfile.o
because $@ always evaluates to the target itself.

To make mkrandfile.o, we must have the source file, mkrandfile.c. Once we have made any changes to it, we compile it with the command

$(CC) -c $(TARGET).c
which evaluates to
cc -c mkrandfile.c
This command creates mkrandfile.o, but it doesn't link it.

The installation is done by calling UNIX  program install, which works a little like copy, but it can do some checking, set permissions, ownership, etc. Here we install our application in the directory pointed to by $(DESTDIR), which is simply a bin subdirectory in my home directory on the AVIDD cluster. We also install the manual page that describes the program in the man/man1 subdirectory of my home directory.

Observe that we check if the directories exist in the first place, and we make them if they don't. The program mkdirhier, which belongs  in the X11 (X11R6 in case of AVIDD) toolkit, will make the whole directory hierarchy if needed.

All three files, mkrandfile.c, mkrandfile.1 and Makefile are under the RCS  control. RCS stands for Revision Control System and it helps maintain the codes in an orderly fashion. All changes made to the code are remembered and can be reversed. RCS maintains the log and version number for every file under its management too.

Here's how it works:

[gustav@bh1 mkrandfile]$ pwd
/N/B/gustav/src/mkrandfile
[gustav@bh1 mkrandfile]$ ls
RCS
[gustav@bh1 mkrandfile]$ make
co  RCS/Makefile,v Makefile
RCS/Makefile,v  -->  Makefile
revision 1.1
done
co  RCS/mkrandfile.c,v mkrandfile.c
RCS/mkrandfile.c,v  -->  mkrandfile.c
revision 1.1
done
cc -c mkrandfile.c
cc -o mkrandfile mkrandfile.o
[gustav@bh1 mkrandfile]$ make install
co  RCS/mkrandfile.1,v mkrandfile.1
RCS/mkrandfile.1,v  -->  mkrandfile.1
revision 1.1
done
[ -d /N/B/gustav/bin ] || mkdirhier /N/B/gustav/bin
install mkrandfile /N/B/gustav/bin
[ -d /N/B/gustav/man/man1 ] || mkdirhier /N/B/gustav/man/man1
install mkrandfile.1 /N/B/gustav/man/man1
[gustav@bh1 mkrandfile]$ ls
Makefile  RCS  mkrandfile  mkrandfile.1  mkrandfile.c  mkrandfile.o
[gustav@bh1 mkrandfile]$ make clobber
rm -f *.o mkrandfile
rcsclean
rm -f Makefile
rm -f mkrandfile.c
rm -f mkrandfile.1
[gustav@bh1 mkrandfile]$
The source lives in the /N/B/gustav/src/mkrandfile/RCS directory. The directory right above it is empty with the exception of the RCS subdirectory. The command make recognizes the presence of RCS and checks out the required files with the RCS  command co. The first file to be checked out is, of course, Makefile. Then the file is analyzed and make commences to execute instructions in it. The source is compiled then linked. make install installs the binary and the manual entry in the appropriate directories, after it has checked that they exist in the first place.

The make process leaves various files in the source directory, i.e., the directory that is above the RCS subdirectory. The command make clobber calls make clean first. This deletes the object file and the executable. Then the RCS  command rcsclean deletes images of files that are already archived in the RCS subdirectory.

The manual  file is written in UNIX troff. troff is a rather ancient text processor (more than 30 years old). But it is so firmly embedded in UNIX and UNIX tradition (UNIX was developed specially for troff) that it is impossible, or at least unwise, to untangle the two.

Here is what the manual page source looks like:

.\" Process this file with
.\" groff -man -Tascii mkrandfile.1
.\"
.\" %Id: mkrandfile.1,v 1.2 2003/09/11 21:05:30 gustav Exp %
.\"
.\" %Log: mkrandfile.1,v %
.\" Revision 1.2  2003/09/11 21:05:30  gustav
.\" Filled the whole of TH.
.\"
.\" Revision 1.1  2003/09/11 19:32:07  gustav
.\" Initial revision
.\"
.TH MKRANDFILE 1 "SEPTEMBER 2003" I590/7462 "I590 Programmer's Manual" 
.SH NAME
mkrandfile \- create a file full of randomized integers
.SH SYNOPSIS
.B mkrandfile 
-f \fIfilename\fR
-l \fIsize\fR
[-h]

.SH DESCRIPTION
.B mkrandfile
creates a file full of randomized integers.  Can be used to create
files for disk-based benchmarks.

.SH OPTIONS
.IP \fB-f\fR
must be followed by output filename.  e.g. foo

.IP \fB-l\fR
must be followed by number of 4MB blocks.  e.g.  100

.IP \fB-h\fR
Print a brief help message

.SH DIAGNOSTICS
Self explanatory and numerous.

.SH EXAMPLES

$ mkrandfile -f /N/gpfs/gustav/tryme -l 10
.br
writing on /N/gpfs/gustav/tryme
.br
writing 10 blocks of 1048576 random integers
.br
$ ls -l /N/gpfs/gustav/tryme
.br
-rw-r--r--    1 gustav   ucs      41943040 Aug 29 14:22 /N/gpfs/gustav/tryme
.br


.SH BUGS
Command line processing may be brittle.

.SH AUTHOR
Zdzislaw Meglicki <gustav@indiana.edu>
Read man 7 man on AVIDD to learn about the meaning of troff and man directives  used in the above listing.

Having made and installed the program, you can bring up its man page by typing:

[gustav@bh1 gustav]$ man mkrandfile

MKRANDFILE(1)        I590 Programmer's Manual       MKRANDFILE(1)

NAME
       mkrandfile - create a file full of randomized integers

SYNOPSIS
       mkrandfile -f filename -l size [-h]

DESCRIPTION
       mkrandfile  creates  a  file  full of randomized integers.
       Can be used to create files for disk-based benchmarks.

OPTIONS
       -f     must be followed by output filename.  e.g. foo

       -l     must be followed by number  of  4MB  blocks.   e.g.
              100

       -h     Print a brief help message

DIAGNOSTICS
       Self explanatory and numerous.

EXAMPLES
       $ mkrandfile -f /N/gpfs/gustav/tryme -l 10
       writing on /N/gpfs/gustav/tryme
       writing 10 blocks of 1048576 random integers
       $ ls -l /N/gpfs/gustav/tryme
       -rw-r--r--     1  gustav    ucs      41943040 Aug 29 14:22
       /N/gpfs/gustav/tryme

BUGS
       Command line processing may be brittle.

AUTHOR
       Zdzislaw Meglicki <gustav@indiana.edu>

I590/7462                 SEPTEMBER 2003            MKRANDFILE(1)

[gustav@bh1 gustav]$

Let me summarize what we have done in this section. First we have written and discussed a very, very, very simple UNIX C program, which we are going to use to generate a very large file of random integers in the following sections. We have also written a Makefile and a man page for this program. Finally, we have run make to compile, link and install the program and its man page.

The three files: the source code, the Makefile and the manual page, together constitute the application. If any one of the three is missing, the job is botched. The job can be botched, of course, in many other ways too, e.g., the program may have bugs, the manual may not format correctly, and the Makefile may have incorrect instructions in it. But you must always remember that it is not enough to just write the program. You must provide the means of making it into an application and you must provide documentation for it too.


next up previous index
Next: Exercises Up: Moving Data Between HPSS Previous: Moving Data Between HPSS
Zdzislaw Meglicki
2004-04-29