Let me begin from explanation of the first two executable lines
in subroutine expand_temp_profile
:
nx = minpower2(2*(dif_nx + 1), idum) ny = minpower2(2*(dif_ny + 1), idum)Function
minpower2
, as will be shown below, returns the smallest
2^{n}, which is greater than or equal to its first argument. n is
then returned in the second argument.
The condition for the acceptable transform length is:
So the first two lines of expand_temp_profile
basically simplify
condition (3.65) to the requirement that n has to
be a power of 2, and, of course, that n has to be greater than or
equal to the first argument. And about what that first argument is,
I'll say a few more words later.
Observe that function expand_temp_profile
does not check if
n is the multiple of the number of nodes. You can call it a feature
or a bug. Clearly the authors of the program simply didn't want to bother.
Remember that this is a demonstration code, so what is shown here
is a somewhat sloppy way of setting up n with minimum of robustness,
minimum of checking, but also minimum of fuss.
I have instrumented that subroutine so as to show a few things as it runs. Here's a modified code fragment:
... nx = minpower2( 2*(dif_nx+1), idum) ny = minpower2( 2*(dif_ny+1), idum) write(*,*) 'expand_temp_profile'; write(*,*) ' dif_nx = ', dif_nx write(*,*) ' dif_ny = ', dif_ny write(*,*) ' n_proc = ', NUMBER_OF_PROCESSORS() write(*,*) ' nx = ', nx write(*,*) ' ny = ', ny scale_f = -twopi / REAL( nx*ny,long) ...
If you try to run this program on, say, 9 processors, subroutine fft
will return with an error:
$ poe diffusion -euidevice css0 -procs 9 -euilib ip -ilevel 3 \ > diffusion.log 2> diffusion.err $ cat diffusion.log expand_temp_profile dif_nx = 7 dif_ny = 7 n_proc = 9 nx = 16 ny = 16 FFT : 0040-188 Context(-1) Task(8) Process(-1,-1) Grid -1 x -1 The assumed shape array must be evenly distributed among the processes. The column block size (2) of the assumed shape array (X) must be equal to the number of columns (0) distributed on this process. FFT : 0040-299 Context(-1) Task(8) Process(-1,-1) Grid -1 x -1 End of input-argument error reporting. For more information, refer to Parallel ESSL Guide and Reference. $The very fact that it is subroutine
fft
capturing this error
and not our own subroutine expand_temp_profile
can be considered
a bug. But then, since the error is captured eventually and an intelligible
message written out by fft
, we can leave it at that.
The way this is handled in subroutine expand_temp_profile
should
be contrasted with a very meticulous checking done in subroutine
get_diffusion_matrix
. In principle we could employ any of the
methods in both subroutines.
Now, why is the call to minpower2
of the form:
nx = minpower2( 2*(dif_nx+1), idum) ny = minpower2( 2*(dif_ny+1), idum)and not
nx = minpower2(dif_nx + 1, idum) ny = minpower2(dif_ny + 1, idum)The reason for this is that a discrete Fourier transform generates amplitudes for positive and for negative wave numbers. Amplitudes for positive wave numbers are placed in slots that correspond to l equal 1 through n_{x} / 2 - 1 in the returned vector, amplitudes for negative wave numbers are placed in slots that correspond to l equal n_{x} / 2 +1 through n_{x} - 1 in the returned vector, and the amplitude for the so called Nyquist frequency is placed in the slot that corresponds to l = n_{x}/2. The first element of the vector corresponds to l=0, that is to a constant. Consequently, if we want, say, 7 harmonics in the x direction, our vector returned by FFT should have at least 16 elements, that is:
In general FFT is so lightweight that it is quite OK to generate more harmonics than you really need, just to be safe.
Observe that when data
is transferred from atmp
to a
we transfer only
dif_nx * dif_ny
harmonics regardless of how many
nx * ny
harmonics we have generated.
I have instrumented the code of expand_temp_profile
again
in order to view coefficients a_{l} returned by that subroutine
for
nx = minpower2( 1*(dif_nx+1), idum) ny = minpower2( 1*(dif_ny+1), idum)
nx = minpower2( 2*(dif_nx+1), idum) ny = minpower2( 2*(dif_ny+1), idum)
nx = minpower2( 4*(dif_nx+1), idum) ny = minpower2( 4*(dif_ny+1), idum)and
nx = minpower2( 8*(dif_nx+1), idum) ny = minpower2( 8*(dif_ny+1), idum)
CALL fft(atmp,scale=scale_f,transpose='N') DO jx = 2, nx DO jy = 2, ny WRITE(*, '(4x, "a(", 1i2, ",", 1i2, ") = ", & &"(", e9.3, ", ", e9.3, ")")') jx - 1, jy - 1, atmp(jx, jy) END DO END DO DO j = 1, dif_nx * dif_ny jx = MOD(j-1, dif_nx) + 2 jy = (j-1) / dif_nx + 2 a(j) = REAL(atmp(jx,jy),long) ENDDO
Now let us use the following namelist:
&input ly_ratio = 1.0d0, delta_t = 0.2, numx = 5, numy = 5, nx = 7, ny = 7, numt = 10, init_f = 8, diff_f = 1, /Initial condition number 8 will default to:
expand_temp_profile dif_nx = 7 dif_ny = 7 n_proc = 8 nx = 8 ny = 8 a( 1, 1) = (0.157E+01, -.962E-16) a( 2, 1) = (0.147E-32, 0.481E-16) a( 3, 1) = (0.141E-15, -.481E-16) a( 4, 1) = (0.147E-32, 0.481E-16) a( 5, 1) = (-.141E-15, -.481E-16) a( 6, 1) = (0.147E-32, 0.481E-16) a( 7, 1) = (-.157E+01, -.616E-32) a( 1, 2) = (0.147E-32, 0.481E-16) a( 2, 2) = (-.147E-32, 0.000E+00) a( 3, 2) = (0.147E-32, 0.468E-32) a( 4, 2) = (-.147E-32, 0.000E+00) a( 5, 2) = (0.147E-32, -.468E-32) a( 6, 2) = (-.147E-32, 0.000E+00) a( 7, 2) = (0.147E-32, -.481E-16) a( 1, 3) = (0.182E-15, -.481E-16) a( 2, 3) = (0.147E-32, 0.448E-32) a( 3, 3) = (0.335E-16, -.916E-32) a( 4, 3) = (0.147E-32, 0.448E-32) a( 5, 3) = (-.335E-16, 0.207E-33) a( 6, 3) = (0.147E-32, 0.448E-32) a( 7, 3) = (-.182E-15, 0.481E-16) a( 1, 4) = (0.147E-32, 0.481E-16) a( 2, 4) = (-.147E-32, 0.000E+00) a( 3, 4) = (0.147E-32, 0.468E-32) a( 4, 4) = (-.147E-32, 0.000E+00) a( 5, 4) = (0.147E-32, -.468E-32) a( 6, 4) = (-.147E-32, 0.000E+00) a( 7, 4) = (0.147E-32, -.481E-16) a( 1, 5) = (-.182E-15, -.481E-16) a( 2, 5) = (0.147E-32, -.448E-32) a( 3, 5) = (-.335E-16, -.207E-33) a( 4, 5) = (0.147E-32, -.448E-32) a( 5, 5) = (0.335E-16, 0.916E-32) a( 6, 5) = (0.147E-32, -.448E-32) a( 7, 5) = (0.182E-15, 0.481E-16) a( 1, 6) = (0.147E-32, 0.481E-16) a( 2, 6) = (-.147E-32, 0.000E+00) a( 3, 6) = (0.147E-32, 0.468E-32) a( 4, 6) = (-.147E-32, 0.000E+00) a( 5, 6) = (0.147E-32, -.468E-32) a( 6, 6) = (-.147E-32, 0.000E+00) a( 7, 6) = (0.147E-32, -.481E-16) a( 1, 7) = (-.157E+01, 0.616E-32) a( 2, 7) = (0.147E-32, -.481E-16) a( 3, 7) = (-.141E-15, 0.481E-16) a( 4, 7) = (0.147E-32, -.481E-16) a( 5, 7) = (0.141E-15, 0.481E-16) a( 6, 7) = (0.147E-32, -.481E-16) a( 7, 7) = (0.157E+01, 0.962E-16)All terms of the order of 10^{-15} or 10^{-32} can be safely considered zero. So the first thing to notice is that the resulting Fourier tranform is real within the accuracy of the computation - as it should be. The only nonvanishing components are:
By multiplying dif_nx + 1
by a power of 2, i.e., 2, 4, 8,
we don't break the condition that the length of the transform
must be a power of 2, as long as dif_nx + 1
is
a power of 2. If dif_nx + 1
is also a multiple of
the number of nodes, we stay within that condition too, so that
subroutine fft
can be called safely. Going for
case 2 results in:
expand_temp_profile dif_nx = 7 dif_ny = 7 n_proc = 8 nx = 16 ny = 16 a( 1, 1) = (0.157E+01, -.146E-15) a( 2, 1) = (0.110E-16, -.773E-16) a( 3, 1) = (0.223E-15, -.853E-16) a( 4, 1) = (0.483E-16, 0.242E-16) a( 5, 1) = (0.861E-16, 0.335E-16) a( 6, 1) = (0.906E-17, 0.122E-15) a( 7, 1) = (-.369E-16, 0.361E-16) a( 1, 2) = (0.451E-18, -.773E-16) a( 2, 2) = (0.272E-17, -.613E-17) a( 3, 2) = (0.264E-17, 0.808E-17) a( 4, 2) = (-.482E-18, -.241E-17) a( 5, 2) = (0.359E-17, 0.425E-17) a( 6, 2) = (-.681E-18, 0.385E-33) a( 7, 2) = (-.217E-16, 0.127E-16) a( 1, 3) = (0.333E-15, -.844E-16) a( 2, 3) = (0.262E-17, 0.335E-17) a( 3, 3) = (0.352E-16, 0.418E-16) a( 4, 3) = (-.100E-16, 0.152E-17) a( 5, 3) = (-.134E-16, -.698E-17) a( 6, 3) = (-.263E-18, -.268E-17) a( 7, 3) = (-.107E-16, -.740E-17) a( 1, 4) = (0.327E-16, 0.237E-16) a( 2, 4) = (-.482E-18, -.241E-17) a( 3, 4) = (-.327E-16, 0.799E-17) a( 4, 4) = (-.341E-17, 0.212E-32) a( 5, 4) = (0.545E-17, -.158E-16) a( 6, 4) = (-.482E-18, 0.241E-17) a( 7, 4) = (0.132E-17, 0.357E-17) a( 1, 5) = (0.000E+00, 0.916E-16) a( 2, 5) = (-.186E-17, 0.163E-17) a( 3, 5) = (-.179E-16, -.177E-16) a( 4, 5) = (0.952E-17, -.152E-17) a( 5, 5) = (0.245E-16, -.162E-16) a( 6, 5) = (0.103E-17, -.230E-17) a( 7, 5) = (0.258E-16, 0.182E-16) a( 1, 6) = (-.159E-16, 0.123E-15) a( 2, 6) = (-.681E-18, 0.000E+00) a( 3, 6) = (0.128E-16, -.228E-16) a( 4, 6) = (-.482E-18, 0.241E-17) a( 5, 6) = (-.113E-16, 0.120E-16) a( 6, 6) = (0.272E-17, 0.613E-17) a( 7, 6) = (-.460E-17, -.144E-16) a( 1, 7) = (-.131E-15, -.552E-16) a( 2, 7) = (-.246E-18, 0.121E-17) a( 3, 7) = (0.198E-16, -.116E-16) a( 4, 7) = (0.134E-16, -.108E-18) a( 5, 7) = (0.107E-17, 0.290E-16) a( 6, 7) = (0.169E-17, 0.220E-17) a( 7, 7) = (-.144E-16, -.312E-16) ...This time all coefficients are zero with the exception of the first one, which is
You can check yourself that the same happens for cases 3 and 4.