- In order to understand what really happens inside
get_diffusion_matrix and expand_temp_profile,
we must learn more about PESSL subroutine fft,
which has some very special requirements.
- In order to evaluate
and
we must provide subroutine fft with values
of T0(x,y) and D(x,y) on a sufficiently dense
grid. We could use the same grid for
this purpose as the grid
which
we have already defined in
main in order to compute values of T(x,y, t) at
those points. But those two grids don't have to
be the same. Furthermore, there are special requirements
imposed by subroutine fft, as to the dimensions
of its own grid. And these are as follows:
- 1.
- Let nx and ny be the number of grid
points in x and y directions.
- 2.
- nx and ny must be a multiple
of a number of processors requested for
the computation.
- 3.
- nx and ny must be no greater than
37748736 and must be of the form:
where h, i, j, k, and m are integers and
- The purpose of subroutine
factor_nodes
and function minpower2 is to ensure that
these very special conditions are satisfied and that
a suitable grid can be generated taking our
original numbers nx and ny, which correspond
to the requested numbers of harmonics
in x and y, as a starting point. We shall discuss
how this is done exactly towards the end of this section.
For the time being let us simply assume that it's
done and that we have tweaked our input nx and ny so that they're now suitable. In other words, for the
FFT computation we generate a physical grid which has
the same number of points in x and y as
the requested number of harmonics in these directions.
- Subroutine
fft computes the following:
Now you can see why we have scaled our problem so that
it would fit into
.
- How to relate this double sum to:
The answer is in the scaling factor
,
which must be set so that
We have nx little segments between 0 and
and ditto for ny. Therefore
Consequently
If we make our
then
we'll evaluate
instead.
- This is exactly what happens below:
scale_f = 1.d0/ REAL(nx*ny, long)
ALLOCATE(df(nx,ny))
DO j = 1, ny
DO i =1, nx
df(i,j) = diff_coef((twopi*(i-1))/nx,(twopi*(j-1))/ny)
ENDDO
ENDDO
CALL fft(df,scale=scale_f,transpose='N')
After having determined the scaling factor
(our
,
theirs scale_f), we
evaluate
D(xi, yj) for
and
,
save the result on an
auxiliary variable df (which is distributed
the same way as f, i.e., (*,BLOCK),
but has different dimensions, i.e.,
,
whereas f has dimensions
),
and load the lot into fft.
- Subroutine
fft will return the numbers that
correspond to the Fourier transforms for
and
,
i.e.,
on the same matrix, df.
- Our next task is to evaluate matrix
F following the formula:
while at the same time collapsing the double
indexes into a single index according to
- But instead of using
we build two arrays ixi and iyi
which simply remember values of j and k that correspond to a given l. This is how it's done:
num_errors=0
ALLOCATE(ixi(dif_nx*dif_ny), stat=istat)
IF( istat .NE. 0 ) num_errors = num_errors + 1
ALLOCATE(iyi(dif_nx*dif_ny), stat=istat)
IF( istat .NE. 0 ) num_errors = num_errors + 1
DO ix = 1, dif_nx
DO iy = 1, dif_ny
i = (iy-1)* dif_nx + ix
ixi(i) = ix
iyi(i) = iy
ENDDO
ENDDO
- Now we can use these arrays in calculating Fij.
The code below has a double-nested
DO loop,
whose outer index runs over j and the inner index
runs over i. At the beginning of every loop
we find a pair (k, l) that corresponds to a given
j or i by looking up arrays iyp and ixp:
DO j = 1, dif_nx*dif_ny
iyp = iyi(j)
ixp = ixi(j)
DO i = 1, dif_nx*dif_ny
iy = iyi(i)
ix = ixi(i)
- Then we evaluate k - k' and j - j':
ixdiff = iabs(ix-ixp) + 1
iydiff = iabs(iy-iyp) + 1
Observe that
we evaluate the absolute values of these (remember the argument
that
F is symmetric) and then add 1.
Why? That's because the fft subroutine returns
results for
and
,
but
in our case
and
.
So, for this
reason we're shifting the indexes in the matrix returned
by fft by 1.
- The matrix Fij is now calculated
as follows:
f(i,j) = ( ( ix*ixp + iy*iyp*dif_ly_ratio ) * &
REAL(df(ixdiff, iydiff) - df(ix+ixp+1,iy+iyp+1)) &
+ ( ix*ixp - iy*iyp*dif_ly_ratio ) * &
REAL(df(ix+ixp+1,iydiff ) - df(ixdiff,iy+iyp+1)))
- Before returning from the subroutine we free space that
was allocated for auxiliary variables:
DEALLOCATE(df)
DEALLOCATE(ixi)
DEALLOCATE(iyi)
RETURN
END SUBROUTINE get_diffusion_matrix
- Let us now have a look at the next subroutine,
which evaluates
a0,
expand_temp_profile.
- The formula for
a0 is:
- This means that we have to scale our
parameter
in order to get the
factor. Setting
gave us
,
so multiplying
by
should yield exactly what's needed:
scale_f = -twopi / REAL( nx*ny,long)
- Now we create a temporary array, called
atmp,
which we're going to initialise to
on a grid in the (x, y) space with dimensions already made
to suit subroutine fft:
ALLOCATE(atmp(nx,ny), stat=istat )
IF( istat .NE. 0 ) THEN
WRITE(*,*) 'allocation failure in expand_temp_profile'
STOP
ENDIF
DO i = 1, nx
DO j = 1, ny
atmp(i,j) = init_temp((twopi*(i-1))/nx, (twopi*(j-1))/ny)
ENDDO
ENDDO
- The next step is simply to call subroutine
fft
passing atmp and scale_f to it:
CALL fft(atmp,scale=scale_f,transpose='N')
The coefficients akj(0) will be returned
on atmp.
- The last step is to collapse the indexes i and j onto a single index and transfer appropriate array
entries from
atmp to a:
DO j = 1, dif_nx * dif_ny
jx = MOD(j-1, dif_nx) + 2
jy = (j-1) / dif_nx + 2
a(j) = REAL(atmp(jx,jy),long)
ENDDO
Observe that this time we do it differently than
in subroutine get_diffusion_matrix.
- Before returning to the caller we free space used by
atmp:
DEALLOCATE(atmp)
RETURN