- In order to understand what really happens inside
`get_diffusion_matrix`

and`expand_temp_profile`

, we must learn more about PESSL subroutine`fft`

, which has some very special requirements. - In order to evaluate

and

we must provide subroutine`fft`

with values of*T*_{0}(*x*,*y*) and*D*(*x*,*y*) on a sufficiently dense grid. We could use the same grid for this purpose as the grid which we have already defined in`main`

in order to compute values of*T*(*x*,*y*,*t*) at those points. But those two grids*don't have to be the same*. Furthermore, there are special requirements imposed by subroutine`fft`

, as to the dimensions of its own grid. And these are as follows:- 1.
- Let
*n*_{x}and*n*_{y}be the number of grid points in*x*and*y*directions. - 2.
*n*_{x}and*n*_{y}must be a*multiple*of a number of processors requested for the computation.- 3.
*n*_{x}and*n*_{y}must be no greater than 37748736 and must be of the form:

where*h*,*i*,*j*,*k*, and*m*are integers and

- The purpose of subroutine
`factor_nodes`

and function`minpower2`

is to ensure that these very special conditions are satisfied and that a suitable grid can be generated taking our*original*numbers*n*_{x}and*n*_{y}, which correspond to the requested numbers of*harmonics*in*x*and*y*, as a starting point. We shall discuss how this is done exactly towards the end of this section. For the time being let us simply assume that it's done and that we have tweaked our input*n*_{x}and*n*_{y}so that they're now suitable. In other words, for the FFT computation we generate a physical grid which has the same number of points in*x*and*y*as the requested number of harmonics in these directions. - Subroutine
`fft`

computes the following:

Now you can see why we have scaled our problem so that it would fit into . - How to relate this double sum to:

The answer is in the scaling factor , which must be set so that

We have*n*_{x}little segments between 0 and and ditto for*n*_{y}. Therefore

Consequently

If we make our then we'll evaluate

instead. - This is exactly what happens below:
scale_f = 1.d0/ REAL(nx*ny, long) ALLOCATE(df(nx,ny)) DO j = 1, ny DO i =1, nx df(i,j) = diff_coef((twopi*(i-1))/nx,(twopi*(j-1))/ny) ENDDO ENDDO CALL fft(df,scale=scale_f,transpose='N')

After having determined the scaling factor (our , theirs`scale_f`

), we evaluate*D*(*x*_{i},*y*_{j}) for and , save the result on an auxiliary variable`df`

(which is distributed the same way as`f`

, i.e.,`(*,BLOCK)`

, but has different dimensions, i.e., , whereas`f`

has dimensions ), and load the lot into`fft`

. - Subroutine
`fft`

will return the numbers that correspond to the Fourier transforms for and , i.e.,

on the same matrix,`df`

. - Our next task is to evaluate matrix
**F**following the formula:

while at the same time collapsing the double indexes into a single index according to

- But instead of using

we build two arrays`ixi`

and`iyi`

which simply*remember*values of*j*and*k*that correspond to a given*l*. This is how it's done:num_errors=0 ALLOCATE(ixi(dif_nx*dif_ny), stat=istat) IF( istat .NE. 0 ) num_errors = num_errors + 1 ALLOCATE(iyi(dif_nx*dif_ny), stat=istat) IF( istat .NE. 0 ) num_errors = num_errors + 1 DO ix = 1, dif_nx DO iy = 1, dif_ny i = (iy-1)* dif_nx + ix ixi(i) = ix iyi(i) = iy ENDDO ENDDO

- Now we can use these arrays in calculating
*F*^{i}_{j}. The code below has a double-nested`DO`

loop, whose outer index runs over*j*and the inner index runs over*i*. At the beginning of every loop we find a pair (*k*,*l*) that corresponds to a given*j*or*i*by looking up arrays`iyp`

and`ixp`

:DO j = 1, dif_nx*dif_ny iyp = iyi(j) ixp = ixi(j) DO i = 1, dif_nx*dif_ny iy = iyi(i) ix = ixi(i)

- Then we evaluate
*k*-*k*' and*j*-*j*':ixdiff = iabs(ix-ixp) + 1 iydiff = iabs(iy-iyp) + 1

Observe that we evaluate the absolute values of these (remember the argument that**F**is symmetric) and then add 1. Why? That's because the`fft`

subroutine returns results for and , but in our case and . So, for this reason we're shifting the indexes in the matrix returned by`fft`

by 1. - The matrix
*F*^{i}_{j}is now calculated as follows:f(i,j) = ( ( ix*ixp + iy*iyp*dif_ly_ratio ) * & REAL(df(ixdiff, iydiff) - df(ix+ixp+1,iy+iyp+1)) & + ( ix*ixp - iy*iyp*dif_ly_ratio ) * & REAL(df(ix+ixp+1,iydiff ) - df(ixdiff,iy+iyp+1)))

- Before returning from the subroutine we free space that
was allocated for auxiliary variables:
DEALLOCATE(df) DEALLOCATE(ixi) DEALLOCATE(iyi) RETURN END SUBROUTINE get_diffusion_matrix

- Let us now have a look at the next subroutine,
which evaluates
**a**_{0},`expand_temp_profile`

. - The formula for
**a**_{0}is:

- This means that we have to scale our
parameter
in order to get the
factor. Setting
gave us
,
so multiplying
by
should yield exactly what's needed:
scale_f = -twopi / REAL( nx*ny,long)

- Now we create a temporary array, called
`atmp`

, which we're going to initialise to on a grid in the (*x*,*y*) space with dimensions already made to suit subroutine`fft`

:ALLOCATE(atmp(nx,ny), stat=istat ) IF( istat .NE. 0 ) THEN WRITE(*,*) 'allocation failure in expand_temp_profile' STOP ENDIF DO i = 1, nx DO j = 1, ny atmp(i,j) = init_temp((twopi*(i-1))/nx, (twopi*(j-1))/ny) ENDDO ENDDO

- The next step is simply to call subroutine
`fft`

passing`atmp`

and`scale_f`

to it:CALL fft(atmp,scale=scale_f,transpose='N')

The coefficients*a*_{kj}(0) will be returned on`atmp`

. - The last step is to collapse the indexes
*i*and*j*onto a single index and transfer appropriate array entries from`atmp`

to`a`

:DO j = 1, dif_nx * dif_ny jx = MOD(j-1, dif_nx) + 2 jy = (j-1) / dif_nx + 2 a(j) = REAL(atmp(jx,jy),long) ENDDO

Observe that this time we do it differently than in subroutine`get_diffusion_matrix`

. - Before returning to the caller we free space used by
`atmp`

:DEALLOCATE(atmp) RETURN