Let me begin from explanation of the first two executable lines
in subroutine expand_temp_profile:
nx = minpower2(2*(dif_nx + 1), idum)
ny = minpower2(2*(dif_ny + 1), idum)
Function minpower2, as will be shown below, returns the smallest
2n, which is greater than or equal to its first argument. n is
then returned in the second argument.
The condition for the acceptable transform length is:
So the first two lines of expand_temp_profile basically simplify
condition (3.65) to the requirement that n has to
be a power of 2, and, of course, that n has to be greater than or
equal to the first argument. And about what that first argument is,
I'll say a few more words later.
Observe that function expand_temp_profile does not check if
n is the multiple of the number of nodes. You can call it a feature
or a bug. Clearly the authors of the program simply didn't want to bother.
Remember that this is a demonstration code, so what is shown here
is a somewhat sloppy way of setting up n with minimum of robustness,
minimum of checking, but also minimum of fuss.
I have instrumented that subroutine so as to show a few things as it runs. Here's a modified code fragment:
...
nx = minpower2( 2*(dif_nx+1), idum)
ny = minpower2( 2*(dif_ny+1), idum)
write(*,*) 'expand_temp_profile';
write(*,*) ' dif_nx = ', dif_nx
write(*,*) ' dif_ny = ', dif_ny
write(*,*) ' n_proc = ', NUMBER_OF_PROCESSORS()
write(*,*) ' nx = ', nx
write(*,*) ' ny = ', ny
scale_f = -twopi / REAL( nx*ny,long)
...
If you try to run this program on, say, 9 processors, subroutine fft
will return with an error:
$ poe diffusion -euidevice css0 -procs 9 -euilib ip -ilevel 3 \
> diffusion.log 2> diffusion.err
$ cat diffusion.log
expand_temp_profile
dif_nx = 7
dif_ny = 7
n_proc = 9
nx = 16
ny = 16
FFT : 0040-188 Context(-1) Task(8) Process(-1,-1) Grid -1 x -1
The assumed shape array must be evenly distributed among the processes.
The column block size (2) of the assumed shape array (X) must be
equal to the number of columns (0) distributed on this process.
FFT : 0040-299 Context(-1) Task(8) Process(-1,-1) Grid -1 x -1
End of input-argument error reporting. For more information,
refer to Parallel ESSL Guide and Reference.
$
The very fact that it is subroutine fft capturing this error
and not our own subroutine expand_temp_profile can be considered
a bug. But then, since the error is captured eventually and an intelligible
message written out by fft, we can leave it at that.
The way this is handled in subroutine expand_temp_profile should
be contrasted with a very meticulous checking done in subroutine
get_diffusion_matrix. In principle we could employ any of the
methods in both subroutines.
Now, why is the call to minpower2 of the form:
nx = minpower2( 2*(dif_nx+1), idum)
ny = minpower2( 2*(dif_ny+1), idum)
and not
nx = minpower2(dif_nx + 1, idum)
ny = minpower2(dif_ny + 1, idum)
The reason for this is that a discrete Fourier transform generates
amplitudes for positive and for negative wave numbers. Amplitudes
for positive wave numbers are placed in slots that correspond to
l equal 1 through
nx / 2 - 1 in the returned vector,
amplitudes for negative wave
numbers are placed in slots that correspond to l equal
nx / 2 +1 through nx - 1 in the returned vector,
and the amplitude for the so called Nyquist frequency is placed
in the slot that corresponds to l = nx/2. The first element
of the vector corresponds to l=0, that is to a constant.
Consequently, if we want, say,
7 harmonics in the x direction, our vector returned by FFT
should have at least 16 elements, that is:
In general FFT is so lightweight that it is quite OK to generate more harmonics than you really need, just to be safe.
Observe that when data
is transferred from atmp to a we transfer only
dif_nx * dif_ny harmonics regardless of how many
nx * ny harmonics we have generated.
I have instrumented the code of expand_temp_profile again
in order to view coefficients al returned by that subroutine
for
nx = minpower2( 1*(dif_nx+1), idum)
ny = minpower2( 1*(dif_ny+1), idum)
nx = minpower2( 2*(dif_nx+1), idum)
ny = minpower2( 2*(dif_ny+1), idum)
nx = minpower2( 4*(dif_nx+1), idum)
ny = minpower2( 4*(dif_ny+1), idum)
and
nx = minpower2( 8*(dif_nx+1), idum)
ny = minpower2( 8*(dif_ny+1), idum)
CALL fft(atmp,scale=scale_f,transpose='N')
DO jx = 2, nx
DO jy = 2, ny
WRITE(*, '(4x, "a(", 1i2, ",", 1i2, ") = ", &
&"(", e9.3, ", ", e9.3, ")")') jx - 1, jy - 1, atmp(jx, jy)
END DO
END DO
DO j = 1, dif_nx * dif_ny
jx = MOD(j-1, dif_nx) + 2
jy = (j-1) / dif_nx + 2
a(j) = REAL(atmp(jx,jy),long)
ENDDO
Now let us use the following namelist:
&input ly_ratio = 1.0d0, delta_t = 0.2, numx = 5, numy = 5, nx = 7, ny = 7, numt = 10, init_f = 8, diff_f = 1, /Initial condition number 8 will default to:
expand_temp_profile
dif_nx = 7
dif_ny = 7
n_proc = 8
nx = 8
ny = 8
a( 1, 1) = (0.157E+01, -.962E-16)
a( 2, 1) = (0.147E-32, 0.481E-16)
a( 3, 1) = (0.141E-15, -.481E-16)
a( 4, 1) = (0.147E-32, 0.481E-16)
a( 5, 1) = (-.141E-15, -.481E-16)
a( 6, 1) = (0.147E-32, 0.481E-16)
a( 7, 1) = (-.157E+01, -.616E-32)
a( 1, 2) = (0.147E-32, 0.481E-16)
a( 2, 2) = (-.147E-32, 0.000E+00)
a( 3, 2) = (0.147E-32, 0.468E-32)
a( 4, 2) = (-.147E-32, 0.000E+00)
a( 5, 2) = (0.147E-32, -.468E-32)
a( 6, 2) = (-.147E-32, 0.000E+00)
a( 7, 2) = (0.147E-32, -.481E-16)
a( 1, 3) = (0.182E-15, -.481E-16)
a( 2, 3) = (0.147E-32, 0.448E-32)
a( 3, 3) = (0.335E-16, -.916E-32)
a( 4, 3) = (0.147E-32, 0.448E-32)
a( 5, 3) = (-.335E-16, 0.207E-33)
a( 6, 3) = (0.147E-32, 0.448E-32)
a( 7, 3) = (-.182E-15, 0.481E-16)
a( 1, 4) = (0.147E-32, 0.481E-16)
a( 2, 4) = (-.147E-32, 0.000E+00)
a( 3, 4) = (0.147E-32, 0.468E-32)
a( 4, 4) = (-.147E-32, 0.000E+00)
a( 5, 4) = (0.147E-32, -.468E-32)
a( 6, 4) = (-.147E-32, 0.000E+00)
a( 7, 4) = (0.147E-32, -.481E-16)
a( 1, 5) = (-.182E-15, -.481E-16)
a( 2, 5) = (0.147E-32, -.448E-32)
a( 3, 5) = (-.335E-16, -.207E-33)
a( 4, 5) = (0.147E-32, -.448E-32)
a( 5, 5) = (0.335E-16, 0.916E-32)
a( 6, 5) = (0.147E-32, -.448E-32)
a( 7, 5) = (0.182E-15, 0.481E-16)
a( 1, 6) = (0.147E-32, 0.481E-16)
a( 2, 6) = (-.147E-32, 0.000E+00)
a( 3, 6) = (0.147E-32, 0.468E-32)
a( 4, 6) = (-.147E-32, 0.000E+00)
a( 5, 6) = (0.147E-32, -.468E-32)
a( 6, 6) = (-.147E-32, 0.000E+00)
a( 7, 6) = (0.147E-32, -.481E-16)
a( 1, 7) = (-.157E+01, 0.616E-32)
a( 2, 7) = (0.147E-32, -.481E-16)
a( 3, 7) = (-.141E-15, 0.481E-16)
a( 4, 7) = (0.147E-32, -.481E-16)
a( 5, 7) = (0.141E-15, 0.481E-16)
a( 6, 7) = (0.147E-32, -.481E-16)
a( 7, 7) = (0.157E+01, 0.962E-16)
All terms of the order of 10-15 or 10-32 can be safely
considered zero. So the first thing to notice is that the
resulting Fourier tranform
is real within the accuracy of the
computation - as it should be. The only nonvanishing components
are:
By multiplying dif_nx + 1 by a power of 2, i.e., 2, 4, 8,
we don't break the condition that the length of the transform
must be a power of 2, as long as dif_nx + 1 is
a power of 2. If dif_nx + 1 is also a multiple of
the number of nodes, we stay within that condition too, so that
subroutine fft can be called safely. Going for
case 2 results in:
expand_temp_profile
dif_nx = 7
dif_ny = 7
n_proc = 8
nx = 16
ny = 16
a( 1, 1) = (0.157E+01, -.146E-15)
a( 2, 1) = (0.110E-16, -.773E-16)
a( 3, 1) = (0.223E-15, -.853E-16)
a( 4, 1) = (0.483E-16, 0.242E-16)
a( 5, 1) = (0.861E-16, 0.335E-16)
a( 6, 1) = (0.906E-17, 0.122E-15)
a( 7, 1) = (-.369E-16, 0.361E-16)
a( 1, 2) = (0.451E-18, -.773E-16)
a( 2, 2) = (0.272E-17, -.613E-17)
a( 3, 2) = (0.264E-17, 0.808E-17)
a( 4, 2) = (-.482E-18, -.241E-17)
a( 5, 2) = (0.359E-17, 0.425E-17)
a( 6, 2) = (-.681E-18, 0.385E-33)
a( 7, 2) = (-.217E-16, 0.127E-16)
a( 1, 3) = (0.333E-15, -.844E-16)
a( 2, 3) = (0.262E-17, 0.335E-17)
a( 3, 3) = (0.352E-16, 0.418E-16)
a( 4, 3) = (-.100E-16, 0.152E-17)
a( 5, 3) = (-.134E-16, -.698E-17)
a( 6, 3) = (-.263E-18, -.268E-17)
a( 7, 3) = (-.107E-16, -.740E-17)
a( 1, 4) = (0.327E-16, 0.237E-16)
a( 2, 4) = (-.482E-18, -.241E-17)
a( 3, 4) = (-.327E-16, 0.799E-17)
a( 4, 4) = (-.341E-17, 0.212E-32)
a( 5, 4) = (0.545E-17, -.158E-16)
a( 6, 4) = (-.482E-18, 0.241E-17)
a( 7, 4) = (0.132E-17, 0.357E-17)
a( 1, 5) = (0.000E+00, 0.916E-16)
a( 2, 5) = (-.186E-17, 0.163E-17)
a( 3, 5) = (-.179E-16, -.177E-16)
a( 4, 5) = (0.952E-17, -.152E-17)
a( 5, 5) = (0.245E-16, -.162E-16)
a( 6, 5) = (0.103E-17, -.230E-17)
a( 7, 5) = (0.258E-16, 0.182E-16)
a( 1, 6) = (-.159E-16, 0.123E-15)
a( 2, 6) = (-.681E-18, 0.000E+00)
a( 3, 6) = (0.128E-16, -.228E-16)
a( 4, 6) = (-.482E-18, 0.241E-17)
a( 5, 6) = (-.113E-16, 0.120E-16)
a( 6, 6) = (0.272E-17, 0.613E-17)
a( 7, 6) = (-.460E-17, -.144E-16)
a( 1, 7) = (-.131E-15, -.552E-16)
a( 2, 7) = (-.246E-18, 0.121E-17)
a( 3, 7) = (0.198E-16, -.116E-16)
a( 4, 7) = (0.134E-16, -.108E-18)
a( 5, 7) = (0.107E-17, 0.290E-16)
a( 6, 7) = (0.169E-17, 0.220E-17)
a( 7, 7) = (-.144E-16, -.312E-16)
...
This time all coefficients are zero with the exception of
the first one, which is
You can check yourself that the same happens for cases 3 and 4.