PFTP is a program very similar to FTP, but it can move data in parallel between parallel file systems such as GPFS and parallel massive data storage systems such as HPSS. PFTP clients are installed in /usr/local/hpss both on the IUB and on the IUPUI head nodes.
There are two PFTP clients in that directory, a kerberized version, which is called
krb5_gss_pftp_clientand a non-kerberized one, which is called
pftp_clientThe kerberized version is secure and this is what you should use normally, but it doesn't work correctly at present due to some configuration glitches that still have to be resolved. But I will show you briefly how it should work.
In order to use the Kerberized version of PFTP you have to acquire your Kerberos credentials in the HPSS cell first. The name of the cell is dce1.indiana.edu. To acquire the credentials use the program kinit:
[gustav@bh1 gustav]$ kinit gustav@dce1.indiana.edu Password for gustav@dce1.indiana.edu: [gustav@bh1 gustav]$ klist Ticket cache: FILE:/tmp/krb5cc_43098 Default principal: gustav@dce1.indiana.edu Valid starting Expires Service principal 09/04/03 15:18:52 09/05/03 01:18:52 krbtgt/dce1.indiana.edu@dce1.indiana.edu Kerberos 4 ticket cache: /tmp/tkt43098 klist: You have no tickets cached [gustav@bh1 gustav]$You can check the status of your Kerberos credentials with the command klist , as I have done in the example above.
Once you have acquired the credentials, you can connect to HPSS as follows:
[gustav@bh1 gustav]$ krb5_gss_pftp_client hpss.iu.edu 4021 Parallel block size set to 4194304. Connected to hpss.iu.edu. [ Message of the day comes here. It may be out of date. ] 220 hpss01.ucs.indiana.edu FTP server (HPSS 4.3 PFTPD V1.1.1 Fri Jul 19 13:59:25 EST 2002) ready. 334 Using authentication type GSSAPI; ADAT must follow GSSAPI accepted as authentication type GSSAPI authentication succeeded Preauthenticated FTP to hpss.iu.edu as gustav: 232 GSSAPI user /.../dce1.indiana.edu/gustav is authorized as /.../dce1.indiana.edu/gustav 230 User /.../dce1.indiana.edu/gustav logged in. Remote system type is UNIX. Using binary mode to transfer files. [ Other messages come here. ] Multinode is Disabled. ftp> pwd 257 "/.../dce1.indiana.edu/fs/mirror/g/u/gustav" is current directory. ftp> quit 221 Goodbye. [gustav@bh1 gustav]$Program
krb5_gss_pftp_client passes your Kerberos credentials,
in a very secure fashion - your Kerberos password never travels over
the network and you don't have to type it in explicitly anyway -
to the PFTP authentication helper that runs, in this case, on hpss01.ucs.indiana.edu, and the latter completes your authentication and authorization.
You can keep your Kerberos credentials as long as you need, but eventually they will expire after some 10 hours or so anyway. It is a good practice to destroy the credentials before you quit the system. To do so issue the command kdestroy:
[gustav@bh1 gustav]$ kdestroy [gustav@bh1 gustav]$ klist klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_43098) Kerberos 4 ticket cache: /tmp/tkt43098 klist: You have no tickets cached [gustav@bh1 gustav]$Kerberos credentials acquired with kinit survive the logout process. The reason for this is to let you leave a background process that may depend on the credentials. Of course, if you have such process running, you should not destroy the credentials.
Connecting with pftp_client instead of krb5_gss_pftp_client
is much the same, with the single exception of having to type in your
HPSS password manually. The password is transmitted over the network
without encryption. This is the principal reason for using the Kerberized
version.
The IUPUI cluster and the IUPUI HPSS component are better configured for moving data in between at present, and so we'll switch to the IUPUI cluster in the following examples.
But before we go any further, let us create a nice, large file containing randomized integers on GPFS:
[gustav@ih1 gustav]$ cd /N/gpfs/gustav [gustav@ih1 gustav]$ mkrandfile -h synopsis: mkrandfile -f <file> -l <length> [gustav@ih1 gustav]$ time mkrandfile -f test -l 1000 writing on test writing 1000 blocks of 1048576 random integers real 5m19.838s user 0m39.450s sys 0m24.090s [gustav@ih1 gustav]$ ls -l total 4096000 -rw-r--r-- 1 gustav ucs 4194304000 Sep 4 15:46 test [gustav@ih1 gustav]$It took 5 minutes and 20 seconds to write this 4 GB file on GPFS, which yields about 13 MB/s. Some of this time was spent calling the random number generation function, but that should be much faster than IO. What transfer rate are we going to get on reading it?
[gustav@ih1 gustav]$ time cat test > /dev/null real 0m29.001s user 0m0.130s sys 0m15.730s [gustav@ih1 gustav]$This time it only took 29 seconds, which yields transfer rate of about 144 MB/s. This is very fast and we might suspect that the file is either buffered or that UNIX cheats on the
> /dev/null
redirection and simply drops the file without reading.
It is easy to check on the latter as follows:
[gustav@ih1 gustav]$ time cat test > test1 real 6m20.750s user 0m0.040s sys 0m21.780s [gustav@ih1 gustav]$Here we read 4 GBs and as we do so, we write 4 GBs at the same time. If the 144 MB/s transfer rate was real, we should be able to do this in less than 5 minutes and 20 seconds (for writing) plus 30 seconds (for reading), which is 5 minutes and 50 seconds. We do it actually in 6 minutes and 20 seconds, i.e, 30 seconds longer. This is close enough, the more so as data flows through various memory buffers in this process and there is some time-consuming data copying involved. We can therefore guess that UNIX does not cheat on
> /dev/null and
the reading indeed proceeds at about 144 MB/s.
Because there are four GPFS servers on the IUPUI AVIDD component, this yields about 36 MB/s per server, which is quite normal for reading from disk arrays.
Why then is writing to GPFS so slow compared to reading?
The reason for this is that writing is by necessity sequential. The system does not know a priori how long the file is going to be. The system receives data sequentially from the generating process and it writes it sequentially, as files are usually written, sending portions of data first to the first GPFS server, then to the second one, then to the third, fourth and then back to the first one, as the file gets striped over GPFS. In order to speed up this process, we would have to write the file in parallel - not sequentially - and we are going to learn how to do this down the road. But let us get back to HPSS and let us now transfer the file from GPFS to HPSS. Here is how I go about it:
[gustav@ih1 gustav]$ cd /N/gpfs/gustav [gustav@ih1 gustav]$ ls -l total 8192000 -rw-r--r-- 1 gustav ucs 4194304000 Sep 4 15:46 test -rw-r--r-- 1 gustav ucs 4194304000 Sep 4 15:59 test1So, here I went to GPFS on AVIDD-I (the IUPUI cluster) and checked that my 4GB files are there. Now I am going to connect to HPSS using
pftp_client:[gustav@ih1 gustav]$ pftp_client hpss.iu.edu Parallel block size set to 4194304. Connected to hpss.iu.edu. [ message of the day ] 220 hpss-s12.uits.iupui.edu FTP server (HPSS 4.3 PFTPD V1.1.1 Wed Sep 11 15:18:05 EST 2002) ready. Name (hpss.iu.edu:gustav): gustav 331 Password required for /.../dce1.indiana.edu/gustav. Password: 230 User /.../dce1.indiana.edu/gustav logged in. Remote system type is UNIX. Using binary mode to transfer files. [ other messages ] Multinode is Disabled.OK, so now I'm in. I land in my HPSS home directory, which is linked to our institutional DFS. This directory is not suitable for very large files - you must never put anything as large as 4GB there. Instead I switch to the so called ``hpssonly'' directory, where I can place files of arbitrary size.
ftp> pwd 257 "/.../dce1.indiana.edu/fs/mirror/g/u/gustav" is current directory. ftp> cd /:/hpssonly/g/u/gustav 250 CWD command successful.Now I am going to activate the multinode transfer mode. This means that the data will flow in parallel. There is a file in /usr/local/etc, called HPSS.conf, which specifies the configuration and the nodes through which data should flow directly.
ftp> multinode
Processing the multinode list, please wait.....
Multinode is on.
It is not enough to request multinode transfer mode. You have to
specify how many data streams you want to have in the transfer. In our
case we have four HPSS servers and four GPFS servers, so we need
four parallel data streams:
ftp> setpwidth 4
Parallel stripe width set to (4).
Processing the multinode list, please wait.....
Now we need to request the HPSS Class of Service. This class is
actually a default when you make a connection to HPSS from AVIDD-I.
It is a striped class of service, over four HPSS servers, and it
is configured to handle very large files:ftp> quote site setcos 45 200 COS set to 45.Finally, we request the transfer itself. Observe that we don't
put the file, as we would if we worked with a normal FTP.
Instead we pput it. pput is the parallel version of
put. You can see responses from four PFTP client nodes and the
transfer commences.
ftp> pput test
200 Command Complete (4194304000, test, 0, 4, 4194304).
Processing the multinode list, please wait.....
200 Command Complete.
200 Command Complete.
200 Command Complete.
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4194304000).
4194304000 bytes sent in 37.12 seconds (107.75 Mbytes/s)
200 Command Complete.
The transfer rate was nearly 108 MB/s. This is very fast for
FTP. In fact, you could not get such a high transfer rate with
a normal sequential FTP.
Let us check that the file is indeed in HPSS:
ftp> ls -l 200 PORT command successful. 150 Opening ASCII mode data connection for file list. -rw-r----- 1 gustav 1000 365684740 Oct 9 2001 FRIDAYZONE1.MPG -rw-r----- 1 gustav 1000 361170948 Oct 9 2001 FRIDAYZONE2.MPG -rw-r----- 1 gustav 1000 1164392 Jul 17 1999 IU-HPSS.tar.gz drwxr-x--- 2 gustav 1000 512 Jul 10 2001 MPIO drwxr-x--- 2 gustav 1000 512 Feb 13 2001 new -rw-r----- 1 gustav ovpit 4194304000 Sep 4 16:57 test 226 Transfer complete. 391 bytes received in 0.08 seconds (4.90 Kbytes/s) ftp> quit 221 Goodbye. [gustav@ih1 gustav]$Well, it is there, all 4 GBs of it.
Now, how to get it back?
Proceed as before, but this time use pget instead of pput.
Here is the example. I begin by going back to my GPFS directory and
delete files test and test1 from it, then connect to HPSS as before.
[gustav@ih1 gustav]$ cd /N/gpfs/gustav
[gustav@ih1 gustav]$ ls
test test1
[gustav@ih1 gustav]$ rm *
[gustav@ih1 gustav]$ pftp_client hpss.iu.edu
Parallel block size set to 4194304.
Connected to hpss.iu.edu.
[ message of the day ]
220 hpss-s12.uits.iupui.edu FTP server
(HPSS 4.3 PFTPD V1.1.1 Wed Sep 11 15:18:05 EST 2002) ready.
Name (hpss.iu.edu:gustav): gustav
331 Password required for /.../dce1.indiana.edu/gustav.
Password:
230 User /.../dce1.indiana.edu/gustav logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
[ other messages]
Multinode is Disabled.
ftp> cd /:/hpssonly/g/u/gustav
250 CWD command successful.
ftp> ls -l
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
-rw-r----- 1 gustav 1000 365684740 Oct 9 2001 FRIDAYZONE1.MPG
-rw-r----- 1 gustav 1000 361170948 Oct 9 2001 FRIDAYZONE2.MPG
-rw-r----- 1 gustav 1000 1164392 Jul 17 1999 IU-HPSS.tar.gz
drwxr-x--- 2 gustav 1000 512 Jul 10 2001 MPIO
drwxr-x--- 2 gustav 1000 512 Feb 13 2001 new
-rw-r----- 1 gustav ovpit 4194304000 Sep 4 16:57 test
226 Transfer complete.
391 bytes received in 0.11 seconds (3.50 Kbytes/s)
ftp> multinode
Processing the multinode list, please wait.....
Multinode is on.
ftp> setpwidth 4
Parallel stripe width set to (4).
Processing the multinode list, please wait.....
Now I transfer the file back to AVIDD with the parallel version
of get, i.e., with pget:
ftp> pget test
200 Command Complete (4194304000, test, 0, 4, 4194304).
Processing the multinode list, please wait.....
200 Command Complete.
200 Command Complete.
200 Command Complete.
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4194304000).
4194304000 bytes received in 2 minutes,3.57 seconds (32.37 Mbytes/s)
200 Command Complete.
ftp> quit
221 Goodbye.
[gustav@ih1 gustav]$ ls -l
total 4096000
-rw-r--r-- 1 gustav ucs 4194304000 Sep 4 17:14 test
[gustav@ih1 gustav]$
And I get my file back. The return transfer rate was only about
32 MB/s, which is about 8 MB/s/node. This is markedly slower than
108 MB/s we got on writing to HPSS and is caused by the GPFS' poor
performance on writes. Still, it is better than the 13 MB/s
we saw in our previous experiments.
HPSS performs on these tests very well. You can see HPSS' performance on reads, uncontaminated by GPFS, if you drop the data on /dev/null:
ftp> multinode
Processing the multinode list, please wait.....
Multinode is on.
ftp> setpwidth 4
Parallel stripe width set to (4).
Processing the multinode list, please wait.....
ftp> pget test /dev/null
200 Command Complete (4194304000, test, 0, 4, 4194304).
Processing the multinode list, please wait.....
200 Command Complete.
200 Command Complete.
200 Command Complete.
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4194304000).
4194304000 bytes received in 28.84 seconds (138.72 Mbytes/s)
200 Command Complete.
ftp>
The transfer rate on 4-way parallel reads from HPSS is nearly 140 MB/s -
if only there was a matching sink on the AVIDD side to drop the data on.
Very special thanks are due to the Distributed Storage Systems Group (DSSG) administrators, who invested a lot of effort, as well as a lot of skills and knowledge into getting these transfer rates so high (from the HPSS side).
In real life you should expect IO transfer rates to vary. They will depend on various environmental factors such as system load, GPFS load, network load and HPSS load as well.