Here is what the program looks like
/*
* Create a file containing random pattern of specified length.
*
* %Id: mkrandfile.c,v 1.1 2003/09/11 19:32:16 gustav Exp %
*
* %Log: mkrandfile.c,v %
* Revision 1.1 2003/09/11 19:32:16 gustav
* Initial revision
*
*/
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<errno.h>
#define ARRAY_LENGTH 1048576
main(argc, argv)
int argc;
char *argv[];
{
FILE *fp;
int junk[ARRAY_LENGTH], block, i, number_of_blocks = 0;
/* variables for reading the command line */
extern char *optarg;
char *name = NULL;
int c;
/* error handling */
extern int errno;
while ((c = getopt(argc, argv, "f:l:h")) != EOF)
switch(c) {
case 'f':
name = optarg;
(void)printf("writing on %s\n", name);
break;
case 'l':
sscanf (optarg, "%d", &number_of_blocks);
(void)printf("writing %d blocks of %d random integers\n",
number_of_blocks, ARRAY_LENGTH);
break;
case 'h':
printf ("synopsis: %s -f <file> -l <length>\n", argv[0]);
exit(0);
case '?':
printf ("synopsis: %s -f <file> -l <length>\n", argv[0]);
exit(1);
}
if (number_of_blocks < 1) {
printf ("initialize number of blocks with -l option\n");
printf ("use -h for help\n");
exit(2);
}
if (name == NULL) {
printf ("initialize file name with -f option\n");
printf ("use -h for help\n");
exit(2);
}
if (! (fp = fopen(name, "w"))) {
perror (name);
exit(3);
}
srand(28);
for (block = 0; block < number_of_blocks; block++){
for (i = 0; i < ARRAY_LENGTH; junk[i++] = rand());
if (fwrite(junk, sizeof(int), ARRAY_LENGTH, fp) != ARRAY_LENGTH) {
perror (name);
exit (4);
}
}
if (fclose(fp) != 0) {
perror (name);
exit (5);
}
exit (0);
}
Let me explain what the program does and how. This is also going to brush up on your C and UNIX programming skills.
The program begins by calling a standard UNIX function
getopt, which reads the command line. The string
"f:l:h" tells getopt what option switches we
expect to see on the command line. And so, we expect to see
-f followed by an argument, then -l followed by
an argument (the presence of the argument is indicated by ":"),
and -h, which doesn't have a following argument.
The options can appear on the command line in any order. This is a nice thing about getopt. And it is not necessary to use all the options either.
The switch statement that is inside the while loop
processes various cases. And so, if the option character is
"f" (the minus sign in front of the option character is always
implied by getopt), then the argument that follows -f,
and this argument is always going to be a string, is going
to be the name of the file on which we are going to write our randomized
integers. Observe that name is just a pointer to a character,
and it is initialized to NULL. We don't have to copy
the content of the string pointed to by optarg, the latter being
a pointer to a character too. Instead we simply request that name
should point at the same memory location as optarg. This way
we can use the name of the file passed to the program on the
command line, by referring to it as name.
If the option character is "l", the option (-l) should
be followed by an integer. This integer is going to be interpreted
as the number of blocks of one million integers each to be written
on the file. But recall that optarg is always a string.
So here we have to convert this string to an integer. This can be
done in various ways, most of them quite cumbersome, but there is
one easy and relatively safe way to do this. I use function
sscanf, which scans a string the same way that scanf
scans an input line. The string in question is pointed to by
optarg, and we expect the format to be "%d", which
means ``an integer''. The last parameter in the call to
sscanf is the address of number_of_blocks.
Function sscanf will go there and will write the value of
the integer in that location. If sscanf does not find
an integer in optarg it won't write anything there, in
which case number_of_blocks will retain its original
value of zero.
The last option defined by "f:l:h" is -h. This option
does not take an argument. If it is encountered on the command line,
the program will pring the brief synopsis on standard output and
it will exit the program cleanly, i.e., with the exit status zero,
right there and then.
If function getopt encounters any other option on the command
line, it is going to return "?" and the last case of
the switch statement, will print the synopsis of the program
on standard output and will exit the program raising an error flag,
i.e., the exit status will be set to one.
Having collected the information from the command line, we are now going
to check two things, just in case. The first one is the value of
number_of_blocks. This value may be incorrect if, e.g., it has
not been specified on the command line, then it would be zero, or if
it has been specified incorrectly, e.g., made negative. If any of the
two conditions holds, we're going to exit with the exit status set to 2.
The second thing we're going to check is if we have the name of the
file. If the user has not specified the name of the file on the
command line, name will stay set to NULL. If we detect
this condition, we're going to exit the program with the exit status
set to 2 too.
These two simple checks do not exhaust countless possibilities that the user may get something wrong. For example the user may type a weird file name, which may be either too long or it may contain some control codes, or the number of blocks requested may be too large. Some of the problems may be captured by checks in the remainder of the program, but some may remain undetected until it is too late and something horrible happens. This is how security bugs are born. The command line reading procedure in our program is somewhat fragile, but it will do for now.
Having obtained all the information from the command line and having checked that it is sensible, we are now going to open the file for writing. The statement that does it is
if (! (fp = fopen(name, "w"))) {
perror (name);
exit(3);
}
Observe that we don't just open the file. We attempt to
open it and then immediately check if we've been successful or not.
If the attempt is unsuccessful, function fopen is going to
return NIL, which becomes false when cast on
BOOLEAN. The negation operator, "!", will
make it into true and the clause following the if
statement will be executed. Inside this clause function perror
is going to print the name of the file, pointed to by name,
followed by a colon and by a brief explanation of the
error condition encountered. Then we abort having set the exit status to 3.
It is here that we can trap
an incorrectly entered file name. The failure to open the file may be
due to many reasons though. Amongst them may be, e.g., that there is
already a write protected file of this name, or that the user
has no write permission on the directory in which the program
executes.
If fopen returns anything but NIL, this is interpreted
as true, the negation operator "!" makes it into false,
and then we go the remainder of the program.
The remainder is short and sweet. We seed the random number generator
with 28 (nothing special about this number) and then write
the random numbers generated by function rand on the file.
Observe that we don't write a character at the time. Instead we
collect about a million of random integers on an array called
junk, and only after the array is full, we push its content
onto the file in a single long write. This is the most efficient way
of writing data: large blocks, not little drops. But then recall that
the blocks in our program are not so large as to overwrite output
buffers. A million of integers fill 4 MBs only, which is nothing.
Function fwrite is called in a way reminiscent of fopen.
This function returns the number of items of size
sizeof(int) it has managed to write. This number should be
equal to ARRAY_LENGTH if the write has been successful. Otherwise,
we have a write error. For example, we may have run out of disk space,
or out of quota, and then the number is going to be less than
ARRAY_LENGTH. If we detect that fwrite has not written
as many integers as it should have, we call function perror, which
should tell us what happened, and then abort having set the exit status to 4.
The last operation the program does is closing the file. Function
fclose should return zero if there are no problems, otherwise
it is going to return some error code. We don't analyze the error code,
relying on function perror instead. If we have encountered
an error condition at this stage, we abort the program having set
the exit status to 5.
Otherwise, i.e., if everything has gone just fine, we exit the program cleanly, and set the exit status to zero.
We are going to make and install this program by calling UNIX
make. But before we can do this we have to edit
Makefile first. And here it is:
# # %Id: Makefile,v 1.1 2003/09/11 19:31:57 gustav Exp % # # %Log: Makefile,v % # Revision 1.1 2003/09/11 19:31:57 gustav # Initial revision # DESTDIR = /N/B/gustav/bin MANDIR = /N/B/gustav/man/man1 CC = cc TARGET = mkrandfile all: $(TARGET) $(TARGET): $(TARGET).o $(CC) -o $@ $(TARGET).o $(TARGET).o: $(TARGET).c $(CC) -c $(TARGET).c install: all $(TARGET).1 [ -d $(DESTDIR) ] || mkdirhier $(DESTDIR) install $(TARGET) $(DESTDIR) [ -d $(MANDIR) ] || mkdirhier $(MANDIR) install $(TARGET).1 $(MANDIR) clean: rm -f *.o $(TARGET) clobber: clean rcsclean
Our target all is defined as being $(TARGET),
which here evaluates to mkrandfile. Observe that this
Makefile is quite general and you can reuse it for
simple programs with other names too.
To make it we have to have $(TARGET).o, i.e.,
mkrandfile.o first. Once we have it,
we link it by calling
$(CC) -o $@ $(TARGET).owhich evaluates to
cc -o mkrandfile mkrandfile.obecause
$@ always evaluates to the target itself.
To make mkrandfile.o, we must have the source file,
mkrandfile.c. Once we have made any changes to it,
we compile it with the command
$(CC) -c $(TARGET).cwhich evaluates to
cc -c mkrandfile.cThis command creates
mkrandfile.o, but it doesn't link it.
The installation is done by calling UNIX program install,
which works a little like copy, but it can do some checking, set
permissions, ownership, etc. Here we install our application in
the directory pointed to by $(DESTDIR), which is simply
a bin subdirectory in my home directory on the AVIDD cluster.
We also install the manual page that describes the program in
the man/man1 subdirectory of my home directory.
Observe that we check if the directories exist in the first place,
and we make them if they don't. The program mkdirhier, which
belongs
in the X11 (X11R6 in case of AVIDD)
toolkit, will make the whole directory hierarchy
if needed.
All three files, mkrandfile.c, mkrandfile.1 and
Makefile are under the RCS control. RCS stands
for Revision Control System and it helps maintain the codes
in an orderly fashion. All changes made to the code are remembered
and can be reversed. RCS maintains the log and version number
for every file under its management too.
Here's how it works:
[gustav@bh1 mkrandfile]$ pwd /N/B/gustav/src/mkrandfile [gustav@bh1 mkrandfile]$ ls RCS [gustav@bh1 mkrandfile]$ make co RCS/Makefile,v Makefile RCS/Makefile,v --> Makefile revision 1.1 done co RCS/mkrandfile.c,v mkrandfile.c RCS/mkrandfile.c,v --> mkrandfile.c revision 1.1 done cc -c mkrandfile.c cc -o mkrandfile mkrandfile.o [gustav@bh1 mkrandfile]$ make install co RCS/mkrandfile.1,v mkrandfile.1 RCS/mkrandfile.1,v --> mkrandfile.1 revision 1.1 done [ -d /N/B/gustav/bin ] || mkdirhier /N/B/gustav/bin install mkrandfile /N/B/gustav/bin [ -d /N/B/gustav/man/man1 ] || mkdirhier /N/B/gustav/man/man1 install mkrandfile.1 /N/B/gustav/man/man1 [gustav@bh1 mkrandfile]$ ls Makefile RCS mkrandfile mkrandfile.1 mkrandfile.c mkrandfile.o [gustav@bh1 mkrandfile]$ make clobber rm -f *.o mkrandfile rcsclean rm -f Makefile rm -f mkrandfile.c rm -f mkrandfile.1 [gustav@bh1 mkrandfile]$The source lives in the
/N/B/gustav/src/mkrandfile/RCS
directory. The directory right above it is empty with the
exception of the RCS subdirectory. The command make
recognizes the presence of RCS and checks out the required
files with the RCS command co.
The first file to be checked out is, of course, Makefile. Then
the file is analyzed and make commences to execute
instructions in it. The source is compiled then linked.
make install installs the binary and the manual entry
in the appropriate directories, after it has checked that they
exist in the first place.
The make process leaves various files
in the source directory, i.e., the directory that is above the
RCS subdirectory. The command make clobber calls make clean
first. This deletes the object file and the executable. Then
the RCS command rcsclean deletes images of
files that are already archived in the RCS subdirectory.
The manual
file is written in UNIX troff. troff is a rather
ancient text processor (more than 30 years old). But it is so
firmly embedded in UNIX and UNIX tradition (UNIX was developed specially
for troff) that it is impossible, or at least unwise, to
untangle the two.
Here is what the manual page source looks like:
.\" Process this file with .\" groff -man -Tascii mkrandfile.1 .\" .\" %Id: mkrandfile.1,v 1.2 2003/09/11 21:05:30 gustav Exp % .\" .\" %Log: mkrandfile.1,v % .\" Revision 1.2 2003/09/11 21:05:30 gustav .\" Filled the whole of TH. .\" .\" Revision 1.1 2003/09/11 19:32:07 gustav .\" Initial revision .\" .TH MKRANDFILE 1 "SEPTEMBER 2003" I590/7462 "I590 Programmer's Manual" .SH NAME mkrandfile \- create a file full of randomized integers .SH SYNOPSIS .B mkrandfile -f \fIfilename\fR -l \fIsize\fR [-h] .SH DESCRIPTION .B mkrandfile creates a file full of randomized integers. Can be used to create files for disk-based benchmarks. .SH OPTIONS .IP \fB-f\fR must be followed by output filename. e.g. foo .IP \fB-l\fR must be followed by number of 4MB blocks. e.g. 100 .IP \fB-h\fR Print a brief help message .SH DIAGNOSTICS Self explanatory and numerous. .SH EXAMPLES $ mkrandfile -f /N/gpfs/gustav/tryme -l 10 .br writing on /N/gpfs/gustav/tryme .br writing 10 blocks of 1048576 random integers .br $ ls -l /N/gpfs/gustav/tryme .br -rw-r--r-- 1 gustav ucs 41943040 Aug 29 14:22 /N/gpfs/gustav/tryme .br .SH BUGS Command line processing may be brittle. .SH AUTHOR Zdzislaw Meglicki <gustav@indiana.edu>Read
man 7 man on AVIDD to learn about the meaning of
troff and man directives
used in the above listing.
Having made and installed the program, you can bring up its man page by typing:
[gustav@bh1 gustav]$ man mkrandfile
MKRANDFILE(1) I590 Programmer's Manual MKRANDFILE(1)
NAME
mkrandfile - create a file full of randomized integers
SYNOPSIS
mkrandfile -f filename -l size [-h]
DESCRIPTION
mkrandfile creates a file full of randomized integers.
Can be used to create files for disk-based benchmarks.
OPTIONS
-f must be followed by output filename. e.g. foo
-l must be followed by number of 4MB blocks. e.g.
100
-h Print a brief help message
DIAGNOSTICS
Self explanatory and numerous.
EXAMPLES
$ mkrandfile -f /N/gpfs/gustav/tryme -l 10
writing on /N/gpfs/gustav/tryme
writing 10 blocks of 1048576 random integers
$ ls -l /N/gpfs/gustav/tryme
-rw-r--r-- 1 gustav ucs 41943040 Aug 29 14:22
/N/gpfs/gustav/tryme
BUGS
Command line processing may be brittle.
AUTHOR
Zdzislaw Meglicki <gustav@indiana.edu>
I590/7462 SEPTEMBER 2003 MKRANDFILE(1)
[gustav@bh1 gustav]$
Let me summarize what we have done in this section. First we have
written and discussed a very, very, very simple UNIX C program, which
we are going to use to generate a very large file of random integers
in the following sections.
We have also written a Makefile and a man page for this program.
Finally, we have run make to compile, link and install
the program and its man page.
The three files: the source code, the Makefile and the manual page, together constitute the application. If any one of the three is missing, the job is botched. The job can be botched, of course, in many other ways too, e.g., the program may have bugs, the manual may not format correctly, and the Makefile may have incorrect instructions in it. But you must always remember that it is not enough to just write the program. You must provide the means of making it into an application and you must provide documentation for it too.