Here is the listing of the program:
/*
* Read an existing file full of random integers and perform
* some arbitrary operation on it.
*
* %Id: xrandfile.c,v 1.2 2003/09/12 21:28:33 gustav Exp %
*
* %Log: xrandfile.c,v %
* Revision 1.2 2003/09/12 21:28:33 gustav
* Converted mkrandfile to xrandfile: an application that reads the file and
* counts the number of bytes in it.
*
* Revision 1.1 2003/09/12 17:55:12 gustav
* Initial revision
*
*
*/
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<errno.h>
#define BLOCK_SIZE 1048576
#define SYNOPSIS printf ("synopsis: %s -f <file> [ -l <blocks_per_read> ]\n", argv[0])
#define CLOSE if (fclose(fp) != 0) { perror (name); exit (5); }
main(argc, argv)
int argc;
char *argv[];
{
FILE *fp;
int *junk, size_of_junk, n_of_items, blocks_per_read = 1;
long long count;
/* Note: we are working in a 32-bit architecture. In this architecture
long is the same as int (see limits.h) and LONG_MAX = INT_MAX
= 2^31 - 1 = 2147483647. We have to use (long long), for which
LLONG_MAX = 2^63 - 1 = 9223372036854775807. This is going to
have some very limiting ramifications, when we get to use ROMIO.
*/
/* variables for reading the command line */
extern char *optarg;
char *name = NULL;
int c;
/* error handling */
extern int errno;
size_of_junk = blocks_per_read * BLOCK_SIZE * sizeof(int);
while ((c = getopt(argc, argv, "f:l:h")) != EOF)
switch(c) {
case 'f':
name = optarg;
(void)printf("reading %s\n", name);
break;
case 'l':
if ((1 != sscanf (optarg, "%d", &blocks_per_read)) ||
(blocks_per_read < 1)) {
SYNOPSIS;
exit(1);
}
else {
size_of_junk = blocks_per_read * BLOCK_SIZE * sizeof(int);
printf("reading in chunks of size %d bytes\n", size_of_junk);
}
break;
case 'h':
SYNOPSIS;
exit(0);
case '?':
SYNOPSIS;
exit(1);
}
if (name == NULL) {
SYNOPSIS;
exit(2);
}
if (! (junk = (int*) malloc(size_of_junk))) {
perror ("malloc");
exit(3);
}
else
printf("allocated %d bytes to junk\n", size_of_junk);
if (! (fp = fopen(name, "r"))) {
perror (name);
exit(4);
}
count = 0LL;
while (!feof(fp)){
n_of_items = fread (junk, sizeof(int), blocks_per_read * BLOCK_SIZE, fp);
if (ferror(fp)){
perror("fread");
CLOSE;
exit (6);
}
count = (long long) (count + n_of_items);
}
printf ("read %lld bytes\n", (long long)(count * sizeof(int)));
free((void *)junk);
CLOSE;
exit (0);
}
Let me explain briefly how the program works. We begin by reading the
command line first. The options are as before, but this time
the option -l has a different meaning. It specifies the
size of the reading buffer in units of 4 MB. Because the size
of the reading buffer is no longer predefined, we have to allocate
it dynamically.
Observe that when I process the option -l, I use the number
of parameters matched by function sscanf in order to detect
a possible error. The line of the code that prints the synopsis
of the program is now represented by a macro
SYNOPSIS, which
is defined in the preamble of the program.
After we have finished with reading the command line and checking
for possible input errors, we allocate space for the junk
array by calling malloc. If malloc returns NULL,
which means that it has failed to allocate requested space, we
print the error message and exit. The amount of requested space
in bytes is stored on the variable size_of_junk, which
is calculated at the beginning of the program and then recalculated
again if the -l option is used. This way we set a default
value for it, which is 4 MB.
Having allocated the space, we open the file for reading this time.
Then we initialize the variable count, which will be used
to count number of integers found on the file, to 0LL. This notation
means
``zero in the long-long format''. We have to use ``long-long''
for count,
in order to count number of integers that may be
above INT_MAX, which
is only
231 - 1 = 2147483647, i.e., about 2 billion. Our GPFS
files may contain more than 2 billion integers.
The reading of the file is done
within the while loop,
which checks for the end of file at the top of the loop.
We read chunks of data of size size_of_junk with
fread,
which returns number of items, in this case they are 32-bit integers,
read in a single operation. The number of items read is then added
to count.
But before we increment count we check if the read has not
returned an error. This is not done by investigating the value
returned by
fread, but by calling a special function
ferror. This function operates on the file pointer, fp,
as does the function that detects the end-of-file condition, feof.
If there has been an error, we print the error message, then
attempt to close the file (this is what the CLOSE macro does),
and exit raising error flag 6.
Assuming that all has gone well, we exit the while loop
cleanly and print number of bytes read from the file
on standard output. Then we free the allocated
storage, close the file, and exit.
The Makefile for this program looks exactly the same as the Makefile
for mkrandfile, with the exception that the name mkrandfile
is replaced with xrandfile.
The manual entry for the program looks as follows:
XRANDFILE(1) I590 Programmer's Manual XRANDFILE(1)
NAME
xrandfile - read a file in chunks of various sizes and
count number of bytes in it
SYNOPSIS
xrandfile -f filename [ -l blocks-per-read ] [-h]
DESCRIPTION
xrandfile reads a file in chunks of various sizes and
counts number of bytes in it. Can be used for disk-based
benchmarks.
OPTIONS
-f This parameter must be followed by input filename,
e.g. foo.
-l This parameter is optional. It must be followed by
an integer, which specifies the number of 4MB
blocks per read. The default is one 4MB block per
read.
-h Print a brief help message.
DIAGNOSTICS
Self explanatory and numerous.
EXAMPLES
$ xrandfile -f test -l 10
reading test
reading in chunks of size 41943040 bytes
allocated 41943040 bytes to junk
read 419430400 bytes
$ ls -l test
-rw-r--r-- 1 gustav ucs 419430400 Sep 12 16:20
test
BUGS
Command line processing may be fragile. There is no check-
ing if the requested buffer size does not exceed INT_MAX,
although malloc may catch this.
AUTHOR
Zdzislaw Meglicki <gustav@indiana.edu>
I590/7462 SEPTEMBER 2003 XRANDFILE(1)