ALLOCATABLE.
The dimensions of the arrays are obtained from a name
list by subroutine init_diffusion. We'll see how
that is done later.
dif_x and dif_y are one dimensional arrays
that contain x and y coordinates of points at which
temperature is going to be evaluated for various times.
The points are distributed on a 2D rectangular grid.
There are dif_numx points in the x direction
and dif_numy points in the y direction.
The total number of points is thus dif_numx * dif_numy.
The authors of the program chose to list all those points
as one long one-dimensional array, rather than a 2D matrix.
The one dimensional index that numbers entries in the
corresponding arrays xp and yp (where p stands for
a point of the grid)
is called ij and the formula that converts 2D matrix
indexes
i and j into ij is
ij = dif_numx*(j-1) + i
After having obtained basic parameters for the diffusion
program we allocate space for dif_x and dif_y
and then initialise their values so that
and
.
ALLOCATE( dif_x(dif_numx*dif_numy) ); ALLOCATE( dif_y(dif_numy*dif_numx) )
delx = PI/ ( dif_numx + 1.d0); dely = PI/ ( dif_numy + 1.d0)
DO i = 1, dif_numx
DO j = 1, dif_numy
ij = dif_numx*(j-1) + i
dif_x(ij) = delx* i; dif_y(ij) = dely * j
ENDDO
ENDDO
This way of doing things adds an unnecessary complexity to
the code and gains little, especially on a parallel machine
and in HPF, so you can just as well drop it and use
2D grid arrays instead. This is especially so when modifying
the program for writing images on HDF animation files.
init_diffusion should
have returned the required dimensions of the problem, i.e.,
the required number of points in the
The parameters dif_nx and dif_ny yield
the requested number of harmonics in x and y directions
respectively. Recall that our collapsed index l (equation
3.51) runs from 1 through
,
or, using
the program notation, from 1 through n = dif_nx * dif_ny.
The program uses a simple error control mechanism
accumulating a number of observed allocation problems
on num_errors and checking if it's greater than
0 at the end of this part of the code. The
ALLOCATE statement returns exit status on
stat=.
lambda(n) is the array of eigenvalues,
ALLOCATE(lambda(n),stat=stat); IF(stat.NE.0) num_errors=num_errors+1Observe that
lambda is not mentioned in
the DISTRIBUTE directives.
This means that this array is going to be
replicated on every node.
a(n) corresponds to what in our derivations
we have called
a0 = a(0), i.e., Fourier
coefficients that correspond to the initial
condition
T0(x, y) = T(x, y, 0):
!HPF$ DISTRIBUTE (BLOCK) :: a ... ALLOCATE(a(n), stat=stat); IF(stat.NE.0) num_errors=num_errors+1
ab(n) is the initial condition, i.e.,
a0 ``rotated'' into the eigen-basis
of
F. As you remember our formula
for that was:
lambda
the program uses symbol b for
b is going to be such):
In the code this is done as follows:
!HPF$ DISTRIBUTE (*,BLOCK) :: b
!HPF$ DISTRIBUTE (BLOCK) :: ab
...
ALLOCATE( ab(n), stat=stat); IF( stat .NE. 0) num_errors = num_errors + 1
...
ALLOCATE( b(n,n) , stat=stat ); IF( stat .NE. 0) num_errors = num_errors + 1
...
ab(i) = 0.d0
DO i = 1, n
DO j = 1, n
ab(i) = b(j,i)*a_rep(j) + ab(i)
ENDDO
ENDDO
...
The array a_rep is a replicated version
of a. Observe the following:
ALLOCATE( a_rep(n), stat=stat); IF( stat .NE. 0) num_errors = num_errors + 1 ... CALL expand_temp_profile(a) a_rep = a ...What do we gain by replicating
a_rep(j)?
Observe that b is distributed in the second
index only. Since a_rep is replicated
and not distributed, all those multiplications in
b(j,i)*a_rep(j) will be performed locally,
as will the outer loop (over i) because
ab(i) is distributed the same way
that b(*,i) is, i.e., in a BLOCK
fashion.
xsines and ysines.
These arrays are used in the final evaluation of
| = | ![]() |
(3.59) | |
| = | ![]() |
(3.60) |
dif_nx),
ny is the number of harmonics in the y direction (dif_ny), and
np is the total number of physical grid
points (dif_npts = dif_numx * dif_numy).
The first index of these arrays runs over the physical grid points, and the second index runs over the harmonics:
ALLOCATE( xsines(dif_npts, dif_nx) , stat=stat ); IF( stat .NE. 0) &
num_errors = num_errors + 1
ALLOCATE( ysines(dif_npts, dif_ny) , stat=stat ); IF( stat .NE. 0) &
num_errors = num_errors + 1
These arrays are not distributed either.
That is, they are local to every node.
The reason for this is the following computation
that is performed towards the end of the program:
DO it = 1, dif_ntemps
DO i = 1, dif_npts
iy = (i-1)/dif_nx+1; ix = MOD(i-1,dif_nx)+1
DO k = 1, dif_npts
temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy) * x(i,it)
ENDDO
ENDDO
ENDDO
where the variables temp and x
are distributed in the second
index only. This means that the two inner loops
over i and k will be performed locally,
and only the outer loop, over it will be
distributed.
Observe the somewhat unusual work partitioning: it is the time slices that we're distributing between various processors so that then the whole history is calculated in parallel, simultaneously for various time slices.
The i index of the middle loop corresponds
to the power space index l, whereas the k
index of the innermost loop runs over all physical
grid points. The variable x(i,it)
corresponds to
.
!HPF$ DISTRIBUTE (*,BLOCK) :: temp
...
ALLOCATE(temp(dif_npts, dif_ntemps), stat=stat); IF(stat .NE. 0) &
num_errors = num_errors + 1
The first index of this array runs over all points
of the physical grid, and the second index numbers
consequtive instances of time.
!HPF$ DISTRIBUTE (*,BLOCK) :: f ... ALLOCATE( f(n,n) , stat=stat ); IF( stat .NE. 0) num_errors = num_errors + 1
b. We have
discussed the allocation of this distributed
array above, when describing ab.
abt contains the whole history
of coefficients
a(t) for various
instances of time in the eigen-basis. That is
these coefficients have not been rotated back onto
the original basis yet. In other words they are:
abt like temp is a distributed
array, where it is the various time instances that
have been distributed over multiple nodes.
!HPF$ DISTRIBUTE (*,BLOCK) :: abt
...
ALLOCATE( abt(n,dif_ntemps) , stat=stat ); IF( stat .NE. 0) &
num_errors = num_errors + 1
...
DO it = 1, dif_ntemps
i = 0
DO i = 1, n
abt(i,it) = EXP( -lambda(i) *(it-1)*dif_delta_t) * ab(i)
ENDDO
ENDDO
gemm, which takes abt on input.
Array x is distributed in the same way
as abt:
!HPF$ DISTRIBUTE (*,BLOCK) :: x
...
ALLOCATE( x(n,dif_ntemps) , stat=stat ); IF( stat .NE. 0) &
num_errors = num_errors + 1
...
CALL gemm(1.d0,b,abt,0.d0,x,'N','N')
...
DO it = 1, dif_ntemps
DO i = 1, dif_npts
iy = (i-1)/dif_nx+1; ix = MOD(i-1,dif_nx)+1
DO k = 1, dif_npts
temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy) * x(i,it)
ENDDO
ENDDO
ENDDO
I have quoted again the computation of T, this
time in order to draw your attention to x, rather
than to xsines and ysines.
lambda(harmonics) replicated a(harmonics) distributed over harmonics ab(harmonics) distributed over harmonics a_rep(harmonics) replicated (copy of ab) xsines(grid, harmonics_in_x) replicated ysines(grid, harmonics_in_y) replicated temp(grid, time) distributed over time f(harmonics, harmonics) distributed over second harmonics arg b(harmonics, harmonics) distributed over second harmonics arg abt(harmonics, time) distributed over time x(harmonics, time) distributed over time
IF( num_errors .GT. 0 ) THEN WRITE(*,*) 'Error in allocating arrays in main'; STOP ENDIF
a (which stands
for
a0) is returned, it is gathered
on one of the processes and then scattered in full amongst
all participating processes. In HPF this is simply
done by saying:
a_rep = awhere
a is a DISTRIBUTED array, and
a_rep is a replicated array. All HPF
objects are replicated by default unless they
are either DISTRIBUTED or SEQUENCE.
In summary, the arrays a0 and F are evaluated by the following:
CALL expand_temp_profile(a); a_rep = a CALL get_diffusion_matrix(f)
DO loops evaluate
dsin is the DOUBLE PRECISION version
of sin.
DO i = 1, dif_nx
DO j = 1, dif_npts
t1 = i*dif_x(j); xsines(j,i) = SQRT(2.d0/pi) * dsin(t1)
ENDDO
ENDDO
DO i = 1, dif_ny
DO j = 1, dif_npts
t2 = i*dif_y(j); ysines(j,i) = SQRT(2.d0/pi) * dsin( t2)
ENDDO
ENDDO
syevx, which solves the eigenvalue
problem and returns both the eigenvalues and eigenvectors:
CALL syevx(f,lambda,'u', z=b)You will find this routine documented on page 598 of ``PESSL Guide and Reference''.
The routine computes eigenvalues and eigenvectors of a real symmetric matrix.
Is our matrix F real and symmetric? The matrix is certainly real, because of the way we have constructed it. The matrix is really based on sines and cosines, which we have replaced at some stage with equivalent Euler formulas employing exponenses. So we are not working here with auxiliary imaginary quantities that we would have to throw away at some stage. All entries of F are real.
Is F symmetric? Consider two indexes: l = (j - 1)nx + k and l' = (j' - 1)nx + k'. Switching l and l' around corresponds to switching j and j' and then k and k' around. But this would not affect expressions such as: k k' + l2r j j' and k k' - l2r j j' in front of the integrals. The integrals are expressed in terms of
| (3.61) |
To sum up
F is real and symmetric. Since it is
symmetric, this implies that
F = FT,
and thus
F is also normal, hence
the transformation
can be chosen to be
orthonormal.
syevx has
a somewhat complicated
synopsis. It takes three mandatory arguments and then
a plethora of optional arguments, all of which must be
flagged with appropriate keywords, such as abstol=,
or ifail=.
The mandatory arguments are:
'U' or
'L' (case doesn't matter), and which
tells syevx whether it will find
the valid values of
F in
upper or in lower part of matrix f.
Because that matrix is symmetric syevx
references only half of it
z=. If this option is requested, subroutine
syevx will return orthonormal eigenvectors of
F on exit in preallocated (and correctly
dimensioned) space given as an argument to this option.
In our case this is:
z=b
syevx we evaluate first
ab
and then
abt for various values of t:
ab(i) = 0.d0
DO i = 1, n
DO j = 1, n
ab(i) = b(j,i)*a_rep(j) + ab(i)
ENDDO
ENDDO
DO it = 1, dif_ntemps
i = 0
DO i = 1, n
abt(i,it) = EXP( -lambda(i) *(it-1)*dif_delta_t) * ab(i)
ENDDO
ENDDO
b,
there is a special
PESSL function called gemm, which stands for
General Matrix Matrix Product, and which can
do it for us easily and very efficiently.
See page 497 of ``PESSL Guide and Reference'' for
more information about it.
This subroutine,
depending on various switches with which it's been
called, will perform one of the following seven
operations:
We call this function in the following way:
CALL gemm(1.d0,b,abt,0.d0,x,'N','N')
And the meaning of various parameters passed to it is as follows:
b.
abt. The matrix here is of a peculiar type, because
its second index numbers time slices.
gemm that matrix
A should not be transposed.
gemm that matrix
B should not be transposed.
x). All that
needs to be done still is to convert it to values of
T(x, y, t).
This is done by the following two loops:
DO it = 1, dif_ntemps
DO k = 1, dif_npts
temp(k, it) = 0.d0
ENDDO
ENDDO
DO it = 1, dif_ntemps
DO i = 1, n
iy = (i-1)/dif_nx+1; ix = MOD(i-1,dif_nx)+1
DO k = 1, dif_npts
temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy) * x(i,it)
ENDDO
ENDDO
ENDDO
WRITE statements simply write
all the results on standard output. They're not worth quoting here.
The only thing that you ought to remember is that in HPF programs only
one node is responsible for I/O. There is no parallel I/O in standard HPF. Yet.
So the HPF run-time system has to collect all those distributed arrays, and
temp is distributed so that various time slices live on various processors,
on a single node and then that node does printing on standard output or writing
on files.
When trying to link HDF subroutines with HPF programs you must not assume
a priori that HPF will do that for you though. You must declare all HDF related
variables as SEQUENCE.
DEALLOCATE(xsines); DEALLOCATE(ysines); DEALLOCATE(lambda) DEALLOCATE(temp); DEALLOCATE(a); DEALLOCATE(abt); DEALLOCATE(a) DEALLOCATE(f); DEALLOCATE(dif_x); DEALLOCATE(dif_y)
Observe that various arrays used in this program have been very thoughtfully distributed or replicated, as the case may be, in order to minimize communication between processes.