ALLOCATABLE. The dimensions of the arrays are obtained from a name list by subroutine
init_diffusion. We'll see how that is done later.
dif_yare one dimensional arrays that contain x and y coordinates of points at which temperature is going to be evaluated for various times. The points are distributed on a 2D rectangular grid. There are
dif_numxpoints in the x direction and
dif_numypoints in the y direction. The total number of points is thus
dif_numx * dif_numy. The authors of the program chose to list all those points as one long one-dimensional array, rather than a 2D matrix. The one dimensional index that numbers entries in the corresponding arrays xp and yp (where p stands for a point of the grid) is called
ijand the formula that converts 2D matrix indexes
ij = dif_numx*(j-1) + i
After having obtained basic parameters for the diffusion
program we allocate space for
and then initialise their values so that
ALLOCATE( dif_x(dif_numx*dif_numy) ); ALLOCATE( dif_y(dif_numy*dif_numx) ) delx = PI/ ( dif_numx + 1.d0); dely = PI/ ( dif_numy + 1.d0) DO i = 1, dif_numx DO j = 1, dif_numy ij = dif_numx*(j-1) + i dif_x(ij) = delx* i; dif_y(ij) = dely * j ENDDO ENDDOThis way of doing things adds an unnecessary complexity to the code and gains little, especially on a parallel machine and in HPF, so you can just as well drop it and use 2D grid arrays instead. This is especially so when modifying the program for writing images on HDF animation files.
init_diffusionshould have returned the required dimensions of the problem, i.e., the required number of points in the space and the required number of harmonics, we are now ready to allocate as much space as is needed.
the requested number of harmonics in x and y directions
respectively. Recall that our collapsed index l (equation
3.51) runs from 1 through
the program notation, from 1 through
n = dif_nx * dif_ny.
The program uses a simple error control mechanism
accumulating a number of observed allocation problems
num_errors and checking if it's greater than
0 at the end of this part of the code. The
ALLOCATE statement returns exit status on
lambda(n)is the array of eigenvalues, of matrix F:
ALLOCATE(lambda(n),stat=stat); IF(stat.NE.0) num_errors=num_errors+1Observe that
lambdais not mentioned in the
DISTRIBUTEdirectives. This means that this array is going to be replicated on every node.
a(n)corresponds to what in our derivations we have called a0 = a(0), i.e., Fourier coefficients that correspond to the initial condition T0(x, y) = T(x, y, 0):
!HPF$ DISTRIBUTE (BLOCK) :: a ... ALLOCATE(a(n), stat=stat); IF(stat.NE.0) num_errors=num_errors+1
ab(n)is the initial condition, i.e., a0 ``rotated'' into the eigen-basis of F. As you remember our formula for that was:
lambdathe program uses symbol
bfor . Recall that
bis going to be such):
In the code this is done as follows:
!HPF$ DISTRIBUTE (*,BLOCK) :: b !HPF$ DISTRIBUTE (BLOCK) :: ab ... ALLOCATE( ab(n), stat=stat); IF( stat .NE. 0) num_errors = num_errors + 1 ... ALLOCATE( b(n,n) , stat=stat ); IF( stat .NE. 0) num_errors = num_errors + 1 ... ab(i) = 0.d0 DO i = 1, n DO j = 1, n ab(i) = b(j,i)*a_rep(j) + ab(i) ENDDO ENDDO ...The array
a_repis a replicated version of
a. Observe the following:
ALLOCATE( a_rep(n), stat=stat); IF( stat .NE. 0) num_errors = num_errors + 1 ... CALL expand_temp_profile(a) a_rep = a ...What do we gain by replicating
a_rep(j)? Observe that
bis distributed in the second index only. Since
a_repis replicated and not distributed, all those multiplications in
b(j,i)*a_rep(j)will be performed locally, as will the outer loop (over
ab(i)is distributed the same way that
b(*,i)is, i.e., in a
ysines. These arrays are used in the final evaluation of
dif_nx), ny is the number of harmonics in the y direction (
dif_ny), and np is the total number of physical grid points (
dif_npts = dif_numx * dif_numy).
The first index of these arrays runs over the physical grid points, and the second index runs over the harmonics:
ALLOCATE( xsines(dif_npts, dif_nx) , stat=stat ); IF( stat .NE. 0) & num_errors = num_errors + 1 ALLOCATE( ysines(dif_npts, dif_ny) , stat=stat ); IF( stat .NE. 0) & num_errors = num_errors + 1These arrays are not distributed either. That is, they are local to every node. The reason for this is the following computation that is performed towards the end of the program:
DO it = 1, dif_ntemps DO i = 1, dif_npts iy = (i-1)/dif_nx+1; ix = MOD(i-1,dif_nx)+1 DO k = 1, dif_npts temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy) * x(i,it) ENDDO ENDDO ENDDOwhere the variables
xare distributed in the second index only. This means that the two inner loops over
kwill be performed locally, and only the outer loop, over
itwill be distributed.
Observe the somewhat unusual work partitioning: it is the time slices that we're distributing between various processors so that then the whole history is calculated in parallel, simultaneously for various time slices.
i index of the middle loop corresponds
to the power space index l, whereas the
index of the innermost loop runs over all physical
grid points. The variable
!HPF$ DISTRIBUTE (*,BLOCK) :: temp ... ALLOCATE(temp(dif_npts, dif_ntemps), stat=stat); IF(stat .NE. 0) & num_errors = num_errors + 1The first index of this array runs over all points of the physical grid, and the second index numbers consequtive instances of time.
!HPF$ DISTRIBUTE (*,BLOCK) :: f ... ALLOCATE( f(n,n) , stat=stat ); IF( stat .NE. 0) num_errors = num_errors + 1
b. We have discussed the allocation of this distributed array above, when describing
abtcontains the whole history of coefficients a(t) for various instances of time in the eigen-basis. That is these coefficients have not been rotated back onto the original basis yet. In other words they are:
tempis a distributed array, where it is the various time instances that have been distributed over multiple nodes.
!HPF$ DISTRIBUTE (*,BLOCK) :: abt ... ALLOCATE( abt(n,dif_ntemps) , stat=stat ); IF( stat .NE. 0) & num_errors = num_errors + 1 ... DO it = 1, dif_ntemps i = 0 DO i = 1, n abt(i,it) = EXP( -lambda(i) *(it-1)*dif_delta_t) * ab(i) ENDDO ENDDO
gemm, which takes
abton input. Array
xis distributed in the same way as
!HPF$ DISTRIBUTE (*,BLOCK) :: x ... ALLOCATE( x(n,dif_ntemps) , stat=stat ); IF( stat .NE. 0) & num_errors = num_errors + 1 ... CALL gemm(1.d0,b,abt,0.d0,x,'N','N') ... DO it = 1, dif_ntemps DO i = 1, dif_npts iy = (i-1)/dif_nx+1; ix = MOD(i-1,dif_nx)+1 DO k = 1, dif_npts temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy) * x(i,it) ENDDO ENDDO ENDDOI have quoted again the computation of T, this time in order to draw your attention to x, rather than to
lambda(harmonics) replicated a(harmonics) distributed over harmonics ab(harmonics) distributed over harmonics a_rep(harmonics) replicated (copy of ab) xsines(grid, harmonics_in_x) replicated ysines(grid, harmonics_in_y) replicated temp(grid, time) distributed over time f(harmonics, harmonics) distributed over second harmonics arg b(harmonics, harmonics) distributed over second harmonics arg abt(harmonics, time) distributed over time x(harmonics, time) distributed over time
IF( num_errors .GT. 0 ) THEN WRITE(*,*) 'Error in allocating arrays in main'; STOP ENDIF
a(which stands for a0) is returned, it is gathered on one of the processes and then scattered in full amongst all participating processes. In HPF this is simply done by saying:
a_rep = awhere
a_repis a replicated array. All HPF objects are replicated by default unless they are either
In summary, the arrays a0 and F are evaluated by the following:
CALL expand_temp_profile(a); a_rep = a CALL get_diffusion_matrix(f)
DOUBLE PRECISIONversion of
DO i = 1, dif_nx DO j = 1, dif_npts t1 = i*dif_x(j); xsines(j,i) = SQRT(2.d0/pi) * dsin(t1) ENDDO ENDDO DO i = 1, dif_ny DO j = 1, dif_npts t2 = i*dif_y(j); ysines(j,i) = SQRT(2.d0/pi) * dsin( t2) ENDDO ENDDO
syevx, which solves the eigenvalue problem and returns both the eigenvalues and eigenvectors:
CALL syevx(f,lambda,'u', z=b)You will find this routine documented on page 598 of ``PESSL Guide and Reference''.
The routine computes eigenvalues and eigenvectors of a real symmetric matrix.
Is our matrix F real and symmetric? The matrix is certainly real, because of the way we have constructed it. The matrix is really based on sines and cosines, which we have replaced at some stage with equivalent Euler formulas employing exponenses. So we are not working here with auxiliary imaginary quantities that we would have to throw away at some stage. All entries of F are real.
Is F symmetric? Consider two indexes: l = (j - 1)nx + k and l' = (j' - 1)nx + k'. Switching l and l' around corresponds to switching j and j' and then k and k' around. But this would not affect expressions such as: k k' + l2r j j' and k k' - l2r j j' in front of the integrals. The integrals are expressed in terms of
To sum up F is real and symmetric. Since it is symmetric, this implies that F = FT, and thus F is also normal, hence the transformation can be chosen to be orthonormal.
syevxhas a somewhat complicated synopsis. It takes three mandatory arguments and then a plethora of optional arguments, all of which must be flagged with appropriate keywords, such as
The mandatory arguments are:
'L'(case doesn't matter), and which tells
syevxwhether it will find the valid values of F in upper or in lower part of matrix
f. Because that matrix is symmetric
syevxreferences only half of it
z=. If this option is requested, subroutine
syevxwill return orthonormal eigenvectors of F on exit in preallocated (and correctly dimensioned) space given as an argument to this option. In our case this is:
syevxwe evaluate first
abtfor various values of t:
ab(i) = 0.d0 DO i = 1, n DO j = 1, n ab(i) = b(j,i)*a_rep(j) + ab(i) ENDDO ENDDO DO it = 1, dif_ntemps i = 0 DO i = 1, n abt(i,it) = EXP( -lambda(i) *(it-1)*dif_delta_t) * ab(i) ENDDO ENDDO
b, there is a special PESSL function called
gemm, which stands for General Matrix Matrix Product, and which can do it for us easily and very efficiently. See page 497 of ``PESSL Guide and Reference'' for more information about it. This subroutine, depending on various switches with which it's been called, will perform one of the following seven operations:
We call this function in the following way:
And the meaning of various parameters passed to it is as follows:
abt. The matrix here is of a peculiar type, because its second index numbers time slices.
gemmthat matrix A should not be transposed.
gemmthat matrix B should not be transposed.
x). All that needs to be done still is to convert it to values of T(x, y, t). This is done by the following two loops:
DO it = 1, dif_ntemps DO k = 1, dif_npts temp(k, it) = 0.d0 ENDDO ENDDO DO it = 1, dif_ntemps DO i = 1, n iy = (i-1)/dif_nx+1; ix = MOD(i-1,dif_nx)+1 DO k = 1, dif_npts temp(k,it) = temp(k,it) + xsines(k,ix) * ysines(k,iy) * x(i,it) ENDDO ENDDO ENDDO
WRITEstatements simply write all the results on standard output. They're not worth quoting here. The only thing that you ought to remember is that in HPF programs only one node is responsible for I/O. There is no parallel I/O in standard HPF. Yet. So the HPF run-time system has to collect all those distributed arrays, and
tempis distributed so that various time slices live on various processors, on a single node and then that node does printing on standard output or writing on files.
When trying to link HDF subroutines with HPF programs you must not assume
a priori that HPF will do that for you though. You must declare all HDF related
DEALLOCATE(xsines); DEALLOCATE(ysines); DEALLOCATE(lambda) DEALLOCATE(temp); DEALLOCATE(a); DEALLOCATE(abt); DEALLOCATE(a) DEALLOCATE(f); DEALLOCATE(dif_x); DEALLOCATE(dif_y)
Observe that various arrays used in this program have been very thoughtfully distributed or replicated, as the case may be, in order to minimize communication between processes.