next up previous index
Next: Exercises Up: Moving Data Between HPSS Previous: Exercises

Using PFTP on AVIDD

PFTP  is a program very similar to FTP, but it can move data in parallel between parallel file systems such as GPFS and parallel massive data storage systems such as HPSS. PFTP clients are installed in /usr/local/hpss both on the IUB and on the IUPUI head nodes.

There are two PFTP clients in that directory, a kerberized version, which is called 

krb5_gss_pftp_client
and a non-kerberized one, which is called
pftp_client
The kerberized version is secure and this is what you should use normally, but it doesn't work correctly at present due to some configuration glitches that still have to be resolved. But I will show you briefly how it should work.

In order to use the Kerberized version of PFTP you have to acquire your Kerberos  credentials in the HPSS cell first. The name of the cell is dce1.indiana.edu. To acquire the credentials  use the program kinit:

[gustav@bh1 gustav]$ kinit gustav@dce1.indiana.edu 
Password for gustav@dce1.indiana.edu: 
[gustav@bh1 gustav]$ klist
Ticket cache: FILE:/tmp/krb5cc_43098
Default principal: gustav@dce1.indiana.edu

Valid starting     Expires            Service principal
09/04/03 15:18:52  09/05/03 01:18:52  krbtgt/dce1.indiana.edu@dce1.indiana.edu


Kerberos 4 ticket cache: /tmp/tkt43098
klist: You have no tickets cached
[gustav@bh1 gustav]$
You can check the status of your Kerberos credentials with the command klist , as I have done in the example above.

Once you have acquired the credentials, you can connect to HPSS as follows:

[gustav@bh1 gustav]$ krb5_gss_pftp_client hpss.iu.edu 4021
Parallel block size set to 4194304.
Connected to hpss.iu.edu.

   [ Message of the day comes here. It may be out of date. ]

220 hpss01.ucs.indiana.edu FTP server 
   (HPSS 4.3 PFTPD V1.1.1 Fri Jul 19 13:59:25 EST 2002) ready.
334 Using authentication type GSSAPI; ADAT must follow
GSSAPI accepted as authentication type
GSSAPI authentication succeeded
Preauthenticated FTP to hpss.iu.edu as gustav: 
232 GSSAPI user /.../dce1.indiana.edu/gustav is authorized as 
   /.../dce1.indiana.edu/gustav
230 User /.../dce1.indiana.edu/gustav logged in.
Remote system type is UNIX.
Using binary mode to transfer files.

   [ Other messages come here. ]

Multinode is Disabled.
ftp> pwd
257 "/.../dce1.indiana.edu/fs/mirror/g/u/gustav" is current directory.
ftp> quit
221 Goodbye.
[gustav@bh1 gustav]$
Program krb5_gss_pftp_client passes your Kerberos credentials, in a very secure fashion - your Kerberos password never travels over the network and you don't have to type it in explicitly anyway - to the PFTP authentication helper that runs, in this case, on hpss01.ucs.indiana.edu, and the latter completes your authentication and authorization.

You can keep your Kerberos credentials as long as you need, but eventually they will expire after some 10 hours or so anyway. It is a good practice to destroy the credentials before you quit the system. To do so issue the command  kdestroy:

[gustav@bh1 gustav]$ kdestroy
[gustav@bh1 gustav]$ klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_43098)


Kerberos 4 ticket cache: /tmp/tkt43098
klist: You have no tickets cached
[gustav@bh1 gustav]$
Kerberos credentials acquired with kinit survive the logout process. The reason for this is to let you leave a background process that may depend on the credentials. Of course, if you have such process running, you should not destroy the credentials.

Connecting  with pftp_client instead of krb5_gss_pftp_client is much the same, with the single exception of having to type in your HPSS password manually. The password is transmitted over the network without encryption. This is the principal reason for using the Kerberized version.

The IUPUI cluster and the IUPUI HPSS component are better configured for moving data in between at present, and so we'll switch to the IUPUI cluster in the following examples.

But before we go any further, let us create a nice, large file containing randomized integers on GPFS:

[gustav@ih1 gustav]$ cd /N/gpfs/gustav
[gustav@ih1 gustav]$ mkrandfile -h
synopsis: mkrandfile -f <file> -l <length>
[gustav@ih1 gustav]$ time mkrandfile -f test -l 1000
writing on test
writing 1000 blocks of 1048576 random integers

real    5m19.838s
user    0m39.450s
sys     0m24.090s
[gustav@ih1 gustav]$ ls -l
total 4096000
-rw-r--r--    1 gustav   ucs      4194304000 Sep  4 15:46 test
[gustav@ih1 gustav]$
It took 5 minutes and 20 seconds to write this 4 GB file on GPFS, which yields about 13 MB/s. Some of this time was spent calling the random number generation function, but that should be much faster than IO. What transfer rate are we going to get on reading it?
[gustav@ih1 gustav]$ time cat test > /dev/null

real    0m29.001s
user    0m0.130s
sys     0m15.730s
[gustav@ih1 gustav]$
This time it only took 29 seconds, which yields transfer rate of about 144 MB/s. This is very fast and we might suspect that the file is either buffered or that UNIX cheats on the> /dev/null redirection and simply drops the file without reading.

It is easy to check on the latter as follows:

[gustav@ih1 gustav]$ time cat test > test1

real    6m20.750s
user    0m0.040s
sys     0m21.780s
[gustav@ih1 gustav]$
Here we read 4 GBs and as we do so, we write 4 GBs at the same time. If the 144 MB/s transfer rate was real, we should be able to do this in less than 5 minutes and 20 seconds (for writing) plus 30 seconds (for reading), which is 5 minutes and 50 seconds. We do it actually in 6 minutes and 20 seconds, i.e, 30 seconds longer. This is close enough, the more so as data flows through various memory buffers in this process and there is some time-consuming data copying involved. We can therefore guess that UNIX does not cheat on > /dev/null and the reading indeed proceeds at about 144 MB/s.

Because there are four GPFS servers on the IUPUI AVIDD component, this yields about 36 MB/s per server, which is quite normal for reading from disk arrays.

Why then is writing to GPFS so slow compared to reading?

The reason for this is that writing is by necessity sequential. The system does not know a priori how long the file is going to be. The system receives data sequentially from the generating process and it writes it sequentially, as files are usually written, sending portions of data first to the first GPFS server, then to the second one, then to the third, fourth and then back to the first one, as the file gets striped over GPFS. In order to speed up this process, we would have to write the file in parallel - not sequentially - and we are going to learn how to do this down the road. But let us get back to HPSS and let us now transfer the file from GPFS to HPSS. Here is how I go about it:

[gustav@ih1 gustav]$ cd /N/gpfs/gustav
[gustav@ih1 gustav]$ ls -l
total 8192000
-rw-r--r--    1 gustav   ucs      4194304000 Sep  4 15:46 test
-rw-r--r--    1 gustav   ucs      4194304000 Sep  4 15:59 test1
So, here I went to GPFS on AVIDD-I (the IUPUI cluster) and checked that my 4GB files are there. Now I am going to connect to HPSS using pftp_client:
[gustav@ih1 gustav]$ pftp_client hpss.iu.edu
Parallel block size set to 4194304.
Connected to hpss.iu.edu.

   [ message of the day ]

220 hpss-s12.uits.iupui.edu FTP server 
   (HPSS 4.3 PFTPD V1.1.1 Wed Sep 11 15:18:05 EST 2002) ready.
Name (hpss.iu.edu:gustav): gustav
331 Password required for /.../dce1.indiana.edu/gustav.
Password:
230 User /.../dce1.indiana.edu/gustav logged in.
Remote system type is UNIX.
Using binary mode to transfer files.

   [ other messages ]

Multinode is Disabled.
OK, so now I'm in. I land in my HPSS home directory, which is linked to our institutional DFS. This directory is not suitable for very large files - you must never put anything as large as 4GB there. Instead I switch to the so called ``hpssonly'' directory, where I can place files of arbitrary size.
ftp> pwd
257 "/.../dce1.indiana.edu/fs/mirror/g/u/gustav" is current directory.
ftp> cd /:/hpssonly/g/u/gustav
250 CWD command successful.
Now I am going to activate the multinode transfer mode. This means that the data will flow in parallel. There is a file in /usr/local/etc, called HPSS.conf, which specifies the configuration and the nodes through which data should flow directly.
ftp> multinode
        Processing the  multinode list, please wait.....
Multinode is on.
It is not enough to request multinode transfer mode. You have to specify how many data streams you want to have in the transfer. In our case we have four HPSS servers and four GPFS servers, so we need four parallel data streams:
ftp> setpwidth 4
Parallel stripe width set to (4).
        Processing the  multinode list, please wait.....
Now we need to request the HPSS Class of Service. This class is actually a default when you make a connection to HPSS from AVIDD-I. It is a striped class of service, over four HPSS servers, and it is configured to handle very large files:
ftp> quote site setcos 45
200 COS set to 45.
Finally, we request the transfer itself. Observe that we don't put the file, as we would if we worked with a normal FTP. Instead we pput it. pput is the parallel version of put. You can see responses from four PFTP client nodes and the transfer commences.
ftp> pput test
200 Command Complete (4194304000, test, 0, 4, 4194304).
        Processing the  multinode list, please wait.....
200 Command Complete.
200 Command Complete.
200 Command Complete.
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4194304000).
4194304000 bytes sent in 37.12 seconds (107.75 Mbytes/s)
200 Command Complete.
The transfer rate was nearly 108 MB/s. This is very fast for FTP. In fact, you could not get such a high transfer rate with a normal sequential FTP.

Let us check that the file is indeed in HPSS:

ftp> ls -l
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
-rw-r-----   1 gustav   1000   365684740 Oct  9 2001  FRIDAYZONE1.MPG
-rw-r-----   1 gustav   1000   361170948 Oct  9 2001  FRIDAYZONE2.MPG
-rw-r-----   1 gustav   1000     1164392 Jul 17 1999  IU-HPSS.tar.gz
drwxr-x---   2 gustav   1000         512 Jul 10 2001  MPIO
drwxr-x---   2 gustav   1000         512 Feb 13 2001  new
-rw-r-----   1 gustav   ovpit 4194304000 Sep  4 16:57 test
226 Transfer complete.
391 bytes received in 0.08 seconds (4.90 Kbytes/s)
ftp> quit
221 Goodbye.
[gustav@ih1 gustav]$
Well, it is there, all 4 GBs of it.

Now, how to get it back?

Proceed as before, but this time use pget instead of pput. Here is the example. I begin by going back to my GPFS directory and delete files test and test1 from it, then connect to HPSS as before.

[gustav@ih1 gustav]$ cd /N/gpfs/gustav
[gustav@ih1 gustav]$ ls
test  test1
[gustav@ih1 gustav]$ rm *
[gustav@ih1 gustav]$ pftp_client hpss.iu.edu
Parallel block size set to 4194304.
Connected to hpss.iu.edu.

   [ message of the day ]

220 hpss-s12.uits.iupui.edu FTP server 
   (HPSS 4.3 PFTPD V1.1.1 Wed Sep 11 15:18:05 EST 2002) ready.
Name (hpss.iu.edu:gustav): gustav
331 Password required for /.../dce1.indiana.edu/gustav.
Password:
230 User /.../dce1.indiana.edu/gustav logged in.
Remote system type is UNIX.
Using binary mode to transfer files.

   [ other messages]

Multinode is Disabled.
ftp> cd /:/hpssonly/g/u/gustav
250 CWD command successful.
ftp> ls -l
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
-rw-r-----   1 gustav   1000   365684740 Oct  9 2001  FRIDAYZONE1.MPG
-rw-r-----   1 gustav   1000   361170948 Oct  9 2001  FRIDAYZONE2.MPG
-rw-r-----   1 gustav   1000     1164392 Jul 17 1999  IU-HPSS.tar.gz
drwxr-x---   2 gustav   1000         512 Jul 10 2001  MPIO
drwxr-x---   2 gustav   1000         512 Feb 13 2001  new
-rw-r-----   1 gustav   ovpit 4194304000 Sep  4 16:57 test
226 Transfer complete.
391 bytes received in 0.11 seconds (3.50 Kbytes/s)
ftp> multinode
        Processing the  multinode list, please wait.....
Multinode is on.
ftp> setpwidth 4
Parallel stripe width set to (4).
        Processing the  multinode list, please wait.....
Now I transfer the file back to AVIDD with the parallel version of get, i.e., with pget:
ftp> pget test
200 Command Complete (4194304000, test, 0, 4, 4194304).
        Processing the  multinode list, please wait.....
200 Command Complete.
200 Command Complete.
200 Command Complete.
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4194304000).
4194304000 bytes received in 2 minutes,3.57 seconds (32.37 Mbytes/s)
200 Command Complete.
ftp> quit
221 Goodbye.
[gustav@ih1 gustav]$ ls -l
total 4096000
-rw-r--r--    1 gustav   ucs      4194304000 Sep  4 17:14 test
[gustav@ih1 gustav]$
And I get my file back. The return transfer rate was only about 32 MB/s, which is about 8 MB/s/node. This is markedly slower than 108 MB/s we got on writing to HPSS and is caused by the GPFS' poor performance on writes. Still, it is better than the 13 MB/s we saw in our previous experiments.

HPSS performs on these tests very well. You can see HPSS' performance on reads, uncontaminated by GPFS, if you drop the data on /dev/null:

ftp> multinode
        Processing the  multinode list, please wait.....
Multinode is on.
ftp> setpwidth 4
Parallel stripe width set to (4).
        Processing the  multinode list, please wait.....
ftp> pget test /dev/null
200 Command Complete (4194304000, test, 0, 4, 4194304).
        Processing the  multinode list, please wait.....
200 Command Complete.
200 Command Complete.
200 Command Complete.
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4194304000).
4194304000 bytes received in 28.84 seconds (138.72 Mbytes/s)
200 Command Complete.
ftp>
The transfer rate on 4-way parallel reads from HPSS is nearly 140 MB/s - if only there was a matching sink on the AVIDD side to drop the data on.

Very special thanks are due to the Distributed  Storage Systems Group (DSSG)  administrators, who invested a lot of effort, as well as a lot of skills and knowledge into getting these transfer rates so high (from the HPSS side).

In real life you should expect IO transfer rates to vary. They will depend on various environmental factors such as system load, GPFS load, network load and HPSS load as well.


next up previous index
Next: Exercises Up: Moving Data Between HPSS Previous: Exercises
Zdzislaw Meglicki
2004-04-29