Next: LU Decomposition
Up: Gauss-Jordan Elimination
Previous: The Discussion
The algorithm sweeps down the matrix from the top
left corner to the bottom right corner, leaving
zero subdiagonal elements behind it.
Figure 5.1:
Communication and computation in the various
phases of the HPF Gaussian (from Foster)
![\includegraphics*[width=5cm,angle=270]{img1000.ps}](img644.gif) |
What is parallel in the algorithm?
- 1.
MAXLOC: reduction operation on the row and column
defined by the mask lpiv,
then broadcast within that row and column
- 2.
- scale factors require N-n independent operations
within column
icol
- 3.
- scale factor and a pivot row value must be broadcast
within each column and
row respectively
- 4.
- the reductions require
independent
operations
Attributes of the computation:
- There is little locality in communication,
apart from broadcasts and reductions in rows and columns.
- Computation is clustered: much of it is performed in a single
row and column, and then, once we get to the
reductions, in the bottom right hand corner.
- A
BLOCK distribution is not going to be
advantageous: it would result in many processors being idle!
- Suggested distribution for a small number of processors:
!HPF$ ALIGN B(:,:) WITH A(:,:)
!HPF$ DISTRIBUTE A(*,CYCLIC)
- If a large number of processors is available:
!HPF$ ALIGN B(:,:) WITH A(:,:)
!HPF$ DISTRIBUTE A(CYCLIC,CYCLIC)
Next: LU Decomposition
Up: Gauss-Jordan Elimination
Previous: The Discussion
Zdzislaw Meglicki
2001-02-26