Next: Subroutine get_diffusion_matrix Up: Answers to Assignment 2 Previous: Answers to Assignment 2

### Subroutine expand_temp_profile

Let me begin from explanation of the first two executable lines in subroutine expand_temp_profile:

      nx = minpower2(2*(dif_nx + 1), idum)
ny = minpower2(2*(dif_ny + 1), idum)

Function minpower2, as will be shown below, returns the smallest 2n, which is greater than or equal to its first argument. n is then returned in the second argument.

The condition for the acceptable transform length is:

 n = 2h 3i 5j 7k 11m (3.62)

where

Clearly, we can always assume i = j = k = m = 0 and then the condition simplifies to n = 2h, where . 225 = 33,554,432, which is a pretty big number, and 224 = 16,777,216, so there is a fairly large gap between 225 and 224. But nobody is really going to use that many harmonics anyway. That gap can be filled, if you want to, by using combinations of i, j, k, and m different from 0.

So the first two lines of expand_temp_profile basically simplify condition (3.65) to the requirement that n has to be a power of 2, and, of course, that n has to be greater than or equal to the first argument. And about what that first argument is, I'll say a few more words later.

Observe that function expand_temp_profile does not check if n is the multiple of the number of nodes. You can call it a feature or a bug. Clearly the authors of the program simply didn't want to bother. Remember that this is a demonstration code, so what is shown here is a somewhat sloppy way of setting up n with minimum of robustness, minimum of checking, but also minimum of fuss.

I have instrumented that subroutine so as to show a few things as it runs. Here's a modified code fragment:

 ...
nx = minpower2( 2*(dif_nx+1), idum)
ny = minpower2( 2*(dif_ny+1), idum)
write(*,*) 'expand_temp_profile';
write(*,*) '   dif_nx = ', dif_nx
write(*,*) '   dif_ny = ', dif_ny
write(*,*) '   n_proc = ', NUMBER_OF_PROCESSORS()
write(*,*) '   nx     = ', nx
write(*,*) '   ny     = ', ny
scale_f = -twopi / REAL( nx*ny,long)
...


If you try to run this program on, say, 9 processors, subroutine fft will return with an error:

$poe diffusion -euidevice css0 -procs 9 -euilib ip -ilevel 3 \ > diffusion.log 2> diffusion.err$ cat diffusion.log
expand_temp_profile

dif_nx =  7
dif_ny =  7
n_proc =  9
nx     =  16
ny     =  16
FFT : 0040-188 Context(-1) Task(8) Process(-1,-1) Grid -1 x -1
The assumed shape array must be evenly distributed among the processes.
The column block size (2) of the assumed shape array (X) must be
equal to the number of columns (0) distributed on this process.

FFT : 0040-299 Context(-1) Task(8) Process(-1,-1) Grid -1 x -1
refer to Parallel ESSL Guide and Reference.
\$

The very fact that it is subroutine fft capturing this error and not our own subroutine expand_temp_profile can be considered a bug. But then, since the error is captured eventually and an intelligible message written out by fft, we can leave it at that.

The way this is handled in subroutine expand_temp_profile should be contrasted with a very meticulous checking done in subroutine get_diffusion_matrix. In principle we could employ any of the methods in both subroutines.

Now, why is the call to minpower2 of the form:

    nx = minpower2( 2*(dif_nx+1), idum)
ny = minpower2( 2*(dif_ny+1), idum)

and not
    nx = minpower2(dif_nx + 1, idum)
ny = minpower2(dif_ny + 1, idum)

The reason for this is that a discrete Fourier transform generates amplitudes for positive and for negative wave numbers. Amplitudes for positive wave numbers are placed in slots that correspond to l equal 1 through nx / 2 - 1 in the returned vector, amplitudes for negative wave numbers are placed in slots that correspond to l equal nx / 2 +1 through nx - 1 in the returned vector, and the amplitude for the so called Nyquist frequency is placed in the slot that corresponds to l = nx/2. The first element of the vector corresponds to l=0, that is to a constant. Consequently, if we want, say, 7 harmonics in the x direction, our vector returned by FFT should have at least 16 elements, that is:
1.
the l=0 term, i.e., a constant
2.
the l=1 term
3.
the l=2 term
4.
the l=3 term
5.
the l=4 term
6.
the l=5 term
7.
the l=6 term
8.
the l=7 term
9.
the l=8 term, i.e., the Nyquist frequency term
10.
the l= -7 term
11.
the l= -6 term
12.
the l= -5 term
13.
the l= -4 term
14.
the l= -3 term
15.
the l= -2 term
16.
the l= -1 term

In general FFT is so lightweight that it is quite OK to generate more harmonics than you really need, just to be safe.

Observe that when data is transferred from atmp to a we transfer only dif_nx * dif_ny harmonics regardless of how many nx * ny harmonics we have generated.

I have instrumented the code of expand_temp_profile again in order to view coefficients al returned by that subroutine for

1.

    nx = minpower2( 1*(dif_nx+1), idum)
ny = minpower2( 1*(dif_ny+1), idum)

2.

    nx = minpower2( 2*(dif_nx+1), idum)
ny = minpower2( 2*(dif_ny+1), idum)

3.

    nx = minpower2( 4*(dif_nx+1), idum)
ny = minpower2( 4*(dif_ny+1), idum)

and
4.

    nx = minpower2( 8*(dif_nx+1), idum)
ny = minpower2( 8*(dif_ny+1), idum)

Here's the modified code fragment:
    CALL fft(atmp,scale=scale_f,transpose='N')
DO jx = 2, nx
DO jy = 2, ny
WRITE(*, '(4x, "a(", 1i2, ",", 1i2, ") = ", &
&"(", e9.3, ", ", e9.3, ")")') jx - 1, jy - 1, atmp(jx, jy)
END DO
END DO
DO j =  1, dif_nx * dif_ny
jx = MOD(j-1, dif_nx) + 2
jy = (j-1) / dif_nx + 2
a(j) = REAL(atmp(jx,jy),long)
ENDDO


Now let us use the following namelist:

&input
ly_ratio = 1.0d0,
delta_t = 0.2,
numx = 5,
numy = 5,
nx = 7,
ny = 7,
numt = 10,
init_f = 8,
diff_f = 1,
/

Initial condition number 8 will default to:

where only is different from 0. Let us see what we are going to get for case 1:
 expand_temp_profile
dif_nx =  7
dif_ny =  7
n_proc =  8
nx     =  8
ny     =  8
a( 1, 1) = (0.157E+01, -.962E-16)
a( 2, 1) = (0.147E-32, 0.481E-16)
a( 3, 1) = (0.141E-15, -.481E-16)
a( 4, 1) = (0.147E-32, 0.481E-16)
a( 5, 1) = (-.141E-15, -.481E-16)
a( 6, 1) = (0.147E-32, 0.481E-16)
a( 7, 1) = (-.157E+01, -.616E-32)
a( 1, 2) = (0.147E-32, 0.481E-16)
a( 2, 2) = (-.147E-32, 0.000E+00)
a( 3, 2) = (0.147E-32, 0.468E-32)
a( 4, 2) = (-.147E-32, 0.000E+00)
a( 5, 2) = (0.147E-32, -.468E-32)
a( 6, 2) = (-.147E-32, 0.000E+00)
a( 7, 2) = (0.147E-32, -.481E-16)
a( 1, 3) = (0.182E-15, -.481E-16)
a( 2, 3) = (0.147E-32, 0.448E-32)
a( 3, 3) = (0.335E-16, -.916E-32)
a( 4, 3) = (0.147E-32, 0.448E-32)
a( 5, 3) = (-.335E-16, 0.207E-33)
a( 6, 3) = (0.147E-32, 0.448E-32)
a( 7, 3) = (-.182E-15, 0.481E-16)
a( 1, 4) = (0.147E-32, 0.481E-16)
a( 2, 4) = (-.147E-32, 0.000E+00)
a( 3, 4) = (0.147E-32, 0.468E-32)
a( 4, 4) = (-.147E-32, 0.000E+00)
a( 5, 4) = (0.147E-32, -.468E-32)
a( 6, 4) = (-.147E-32, 0.000E+00)
a( 7, 4) = (0.147E-32, -.481E-16)
a( 1, 5) = (-.182E-15, -.481E-16)
a( 2, 5) = (0.147E-32, -.448E-32)
a( 3, 5) = (-.335E-16, -.207E-33)
a( 4, 5) = (0.147E-32, -.448E-32)
a( 5, 5) = (0.335E-16, 0.916E-32)
a( 6, 5) = (0.147E-32, -.448E-32)
a( 7, 5) = (0.182E-15, 0.481E-16)
a( 1, 6) = (0.147E-32, 0.481E-16)
a( 2, 6) = (-.147E-32, 0.000E+00)
a( 3, 6) = (0.147E-32, 0.468E-32)
a( 4, 6) = (-.147E-32, 0.000E+00)
a( 5, 6) = (0.147E-32, -.468E-32)
a( 6, 6) = (-.147E-32, 0.000E+00)
a( 7, 6) = (0.147E-32, -.481E-16)
a( 1, 7) = (-.157E+01, 0.616E-32)
a( 2, 7) = (0.147E-32, -.481E-16)
a( 3, 7) = (-.141E-15, 0.481E-16)
a( 4, 7) = (0.147E-32, -.481E-16)
a( 5, 7) = (0.141E-15, 0.481E-16)
a( 6, 7) = (0.147E-32, -.481E-16)
a( 7, 7) = (0.157E+01, 0.962E-16)

All terms of the order of 10-15 or 10-32 can be safely considered zero. So the first thing to notice is that the resulting Fourier tranform   is real within the accuracy of the computation - as it should be. The only nonvanishing components are:

In other words

This is quite OK. We get the first term right, and then we also get three more terms with negative harmonics.

By multiplying dif_nx + 1 by a power of 2, i.e., 2, 4, 8, we don't break the condition that the length of the transform must be a power of 2, as long as dif_nx + 1 is a power of 2. If dif_nx + 1 is also a multiple of the number of nodes, we stay within that condition too, so that subroutine fft can be called safely. Going for case 2 results in:

 expand_temp_profile
dif_nx =  7
dif_ny =  7
n_proc =  8
nx     =  16
ny     =  16
a( 1, 1) = (0.157E+01, -.146E-15)
a( 2, 1) = (0.110E-16, -.773E-16)
a( 3, 1) = (0.223E-15, -.853E-16)
a( 4, 1) = (0.483E-16, 0.242E-16)
a( 5, 1) = (0.861E-16, 0.335E-16)
a( 6, 1) = (0.906E-17, 0.122E-15)
a( 7, 1) = (-.369E-16, 0.361E-16)
a( 1, 2) = (0.451E-18, -.773E-16)
a( 2, 2) = (0.272E-17, -.613E-17)
a( 3, 2) = (0.264E-17, 0.808E-17)
a( 4, 2) = (-.482E-18, -.241E-17)
a( 5, 2) = (0.359E-17, 0.425E-17)
a( 6, 2) = (-.681E-18, 0.385E-33)
a( 7, 2) = (-.217E-16, 0.127E-16)
a( 1, 3) = (0.333E-15, -.844E-16)
a( 2, 3) = (0.262E-17, 0.335E-17)
a( 3, 3) = (0.352E-16, 0.418E-16)
a( 4, 3) = (-.100E-16, 0.152E-17)
a( 5, 3) = (-.134E-16, -.698E-17)
a( 6, 3) = (-.263E-18, -.268E-17)
a( 7, 3) = (-.107E-16, -.740E-17)
a( 1, 4) = (0.327E-16, 0.237E-16)
a( 2, 4) = (-.482E-18, -.241E-17)
a( 3, 4) = (-.327E-16, 0.799E-17)
a( 4, 4) = (-.341E-17, 0.212E-32)
a( 5, 4) = (0.545E-17, -.158E-16)
a( 6, 4) = (-.482E-18, 0.241E-17)
a( 7, 4) = (0.132E-17, 0.357E-17)
a( 1, 5) = (0.000E+00, 0.916E-16)
a( 2, 5) = (-.186E-17, 0.163E-17)
a( 3, 5) = (-.179E-16, -.177E-16)
a( 4, 5) = (0.952E-17, -.152E-17)
a( 5, 5) = (0.245E-16, -.162E-16)
a( 6, 5) = (0.103E-17, -.230E-17)
a( 7, 5) = (0.258E-16, 0.182E-16)
a( 1, 6) = (-.159E-16, 0.123E-15)
a( 2, 6) = (-.681E-18, 0.000E+00)
a( 3, 6) = (0.128E-16, -.228E-16)
a( 4, 6) = (-.482E-18, 0.241E-17)
a( 5, 6) = (-.113E-16, 0.120E-16)
a( 6, 6) = (0.272E-17, 0.613E-17)
a( 7, 6) = (-.460E-17, -.144E-16)
a( 1, 7) = (-.131E-15, -.552E-16)
a( 2, 7) = (-.246E-18, 0.121E-17)
a( 3, 7) = (0.198E-16, -.116E-16)
a( 4, 7) = (0.134E-16, -.108E-18)
a( 5, 7) = (0.107E-17, 0.290E-16)
a( 6, 7) = (0.169E-17, 0.220E-17)
a( 7, 7) = (-.144E-16, -.312E-16)
...

This time all coefficients are zero with the exception of the first one, which is

so that

At the other end we get the negative harmonics:

You can check yourself that the same happens for cases 3 and 4.

Next: Subroutine get_diffusion_matrix Up: Answers to Assignment 2 Previous: Answers to Assignment 2
Zdzislaw Meglicki
2001-02-26