SUBROUTINE jacobi(a, d, v, nrot)
USE nrtype; USE nrutil, ONLY : assert_eq, get_diag, nrerror, unit_matrix, &
upper_triangle
IMPLICIT NONE
INTEGER(i4b), INTENT(out) :: nrot
REAL(sp), DIMENSION(:), INTENT(out) :: d
REAL(sp), DIMENSION(:,:), INTENT(inout) :: a
REAL(sp), DIMENSION(:,:), INTENT(out) :: v
INTEGER(i4b) :: i, ip, iq, n
REAL(sp) :: c, g, h, s, sm, t, tau, theta, tresh
REAL(sp), DIMENSION(SIZE(d)) :: b, z
n = assert_eq((/SIZE(a, 1), SIZE(a, 2), SIZE(d), SIZE(v, 1), SIZE(v, 2)/), &
'jacobi')
CALL unit_matrix(v(:,:))
b(:) = get_diag(a(:,:))
d(:) = b(:)
z(:) = 0.0
nrot = 0
DO i = 1, 50
sm = SUM(ABS(a), mask=upper_triangle(n,n))
IF(sm == 0.0) RETURN
tresh = MERGE(0.2_sp*sm/n**2, 0.0_sp, i < 4)
DO ip=1, n-1
DO iq=ip+1, n
g=100.0_sp*ABS(a(ip,iq))
IF((i > 4) .AND. (ABS(d(ip))+g == ABS(d(ip))) &
.AND. (ABS(d(iq))+g == ABS(d(iq)))) THEN
a(ip,iq)=0.0
ELSE IF (ABS(a(ip,iq)) > tresh) THEN
h = d(iq)-d(ip)
IF(ABS(h)+g == ABS(h)) THEN
t = a(ip,iq)/h
ELSE
theta = 0.5_sp*h/a(ip,iq)
t = 1.0_sp/(ABS(theta)+SQRT(1.0_sp+theta**2))
IF(theta < 0.0) t = -t
END IF
c = 1.0_sp / SQRT(1+t**2)
s = t*c
tau = s/(1.0_sp + c)
h = t*a(ip, iq)
z(ip) = z(ip) - h
z(iq) = z(iq) + h
d(ip) = d(ip) - h
d(iq) = d(iq) + h
a(ip, iq) = 0.0
CALL jrotate(a(1:ip-1, ip), a(1:ip-1, iq))
CALL jrotate(a(ip, ip+1:iq-1), a(ip+1:iq-1, iq))
CALL jrotate(a(ip, iq+1:n), a(iq, iq+1:n))
CALL jrotate(v(:,ip), v(:,iq))
nrot=nrot+1
END IF
END DO
END DO
b(:) = b(:) + z(:)
d(:) = b(:)
z(:) = 0.0
END DO
CALL nrerror('too many iterations in jacobi')
CONTAINS
SUBROUTINE jrotate(a1, a2)
REAL(sp), DIMENSION(:), INTENT(inout) :: a1, a2
REAL(sp), DIMENSION(SIZE(a1)) : wk1
wk1(:) = a1(:)
a1(:) = a1(:) - s*(a2(:) + a1(:) * tau)
a2(:) = a2(:) + s*(wk1(:) - a2(:) * tau)
END SUBROUTINE jrotate
END SUBROUTINE jacobi
Let me explain this code now in some detail.
The subroutine computes all eigenvalues and eigenvectors of
an
matrix a. The elements of a above
the diagonal are destroyed in the process. The eigenvalues
are returned on a vector of length N: d, whereas
the eigenvectors are returned on v, which a matrix
,
like a. The total number of Jacobi rotations
is returned on nrot.
The subroutine begins with a call to function assert_eq, which
is a still to be defined function, whose purpose is to check
that a is indeed a square matrix and that the sizes of
d and v match a. If these conditions are
not met, the function exits with error message that contains
the word 'jacobi', otherwise the common dimension Nis returned and captured on n:
n = assert_eq((/SIZE(a, 1), SIZE(a, 2), SIZE(d), SIZE(v, 1), SIZE(v, 2)/), &
'jacobi')
Then we initialize v to a unit matrix, extract the diagonal
from a and copy it both on an auxiliary array b and
on d, which will ultimately return the eigenvalues. Another
auxiliary array, z is set to zero. The subroutine could be
written without b and z - their use obscures the flow
of computation a little, and, I think that it is here only for the sake
of improving the numerical accuracy of the computational process, which
contains a lot of additions and subtractions:
CALL unit_matrix(v(:,:)) b(:) = get_diag(a(:,:)) d(:) = b(:) z(:) = 0.0
Now we initialize our Jacobi rotation counter to zero and commence Jacobi sweeps. The procedure should normally converge in no more than about 5 sweeps. If it takes more, something must be really wrong. This main loop looks as follows:
nrot = 0
DO i = 1, 50
sm = SUM(ABS(a), mask=upper_triangle(n,n))
IF(sm == 0.0) RETURN
tresh = MERGE(0.2_sp*sm/n**2, 0.0_sp, i < 4)
DO ip=1, n-1
DO iq=ip+1, n
blah... blah... blah...
END DO
END DO
b(:) = b(:) + z(:)
d(:) = b(:)
z(:) = 0.0
END DO
CALL nrerror('too many iterations in jacobi')
At the very beginning we sum absolute values of
all terms in the upper triangle
of matrix
A (or a) and if that sum
ends up being 0 up to machine accuracy, we return:
The parameter tresh is calculated by calling a Fortran intrinsic
MERGE, which returns:
We skip the discussion of what's inside the double loop that sweeps through the upper triangular of A and go right to what happens after the sweep is done.
What you see there is what those two auxiliary vectors b and
z are for. Vector z accumulates t apq with an
appropriate sign (- for pp and + for qq) for every diagonal
term that is stored on b. All those accumulated contributions
are now added to the diagonal, and the new updated diagonal is
transferred to d. Then vector z is cleared and
made ready for the new sweep.
You will see that inside the sweep loop the diagonal, which is stored
on d is updated all the time too, in a similar way to z, so, in principle,
d should contain the same numbers as b(:) + z(:), why
then do d(:) = b(:)? It is here, I think, that the designers of
the code tried to minimize rounding errors that would accumulate
on d(:) in the process.
If for some reason we've done 50 iterations and the process still hasn't converged, we exit the routine with an error message.
Now let us have a look at what's inside the double loop that sweeps through the upper triangular of A.
We begin by testing how large is the element Apq, which is
to be annihilated. So first we simply take
,
and then we execute this elaborate IF statement:
g=100.0_sp*ABS(a(ip,iq))
IF((i > 4) .AND. (ABS(d(ip))+g == ABS(d(ip))) &
.AND. (ABS(d(iq))+g == ABS(d(iq)))) THEN
a(ip,iq)=0.0
ELSE IF (ABS(a(ip,iq)) > tresh) THEN
blah... blah... blah...
END IF
The first clause of IF test for the following condition:
Otherwise, i.e., if the off-diagonal element needs to be rotated away, then check if the element Apq is greater than the threashold we have evaluated earlier.
That threashold was different from zero only for the first three
sweeps, and its value was a kind of an average value for the
upper triangular:
.
So it is here that we implement Jacobi rotation
for the off-diagonal elements that stand out during the first three
sweeps.
By now we've done enough checking, and enough avoiding, and finally we have to get down to work and perform the actual rotation.
The first step is to evaluate
d:
h = d(iq)-d(ip)
IF(ABS(h)+g == ABS(h)) THEN
t = a(ip,iq)/h
ELSE
theta = 0.5_sp*h/a(ip,iq)
t = 1.0_sp/(ABS(theta)+SQRT(1.0_sp+theta**2))
IF(theta < 0.0) t = -t
END IF
Now, once we have t, we evaluate c, s, and
:
c = 1.0_sp / SQRT(1+t**2)
s = t*c
tau = s/(1.0_sp + c)
The next step is to rotate the diagonal elements, because
they are easy to do:
h = t*a(ip, iq)
z(ip) = z(ip) - h
z(iq) = z(iq) + h
d(ip) = d(ip) - h
d(iq) = d(iq) + h
Observe that it is here that we collect the total change to aii per
sweep on z, whereas the diagonal terms themselves are updated
for every sweep on d.
At this stage we won't need apq any more, because it is no longer used explicitly in rotating arp and arq, so we annihilate it:
a(ip, iq) = 0.0
Finally we rotate arp and arq:
jrotate:
CALL jrotate(a(1:ip-1, ip), a(1:ip-1, iq))
CALL jrotate(a(ip, ip+1:iq-1), a(ip+1:iq-1, iq))
CALL jrotate(a(ip, iq+1:n), a(iq, iq+1:n))
These three call correspond to three subsets of
A
that need to be rotated separately, so that we don't hit the
point (p,q) itself.
The first call rotates the two vertical lines: one that stretches from (p,q) upwards, and the other one, parallel to it and slightly to the left, that stretches from the diagonal upwards. Both are of equal length.
The second call rotates the two perpendicular lines between the point (p,q) and the diagonal: one is horizontal and the other is vertical. Again, they are of equal length.
Finall, the third call rotates the two horizontal lines: one that stretches from (p,q) to the right boundary, and the other one slightly below, that stretches from the right boundary to the diagonal. Both are of equal length.
Having rotated arp and arq we rotate the matrix of
eigenvectors v. Remember that they are rotated in a much
the same way as arp and arq, i.e.,
CALL jrotate(v(:,ip), v(:,iq))
After we have rotated matrices
A and
V,
we increment the Jacobi rotation counter nrot:
nrot=nrot+1
The auxiliary subroutine jrotate implements the equations:
jrotate performs the computation
in parallel as follows:
wk1(:) = a1(:)
a1(:) = a1(:) - s*(a2(:) + a1(:) * tau)
a2(:) = a2(:) + s*(wk1(:) - a2(:) * tau)
Observe that the parallelism is both in the horizontal and in the vertical direction of matrix A. There is a fair amount of communication involved too. This algorithm will run well on a vector processor or on an SMP, but not so well on an MPP.
In any case, the code presented in this section shows all parallel operations explicitly and clearly. It is now up to the compiler to make use of it.
Unfortunately, there are no machines on the market at present that can operate as effectively on stridden data as they can work with adjacent data, i.e., with non-stridden vectors. For this computer architectures would have to undergo a dramatic modifications: the memory would have to become multidimensional and the processor architecture would have to become multidimensional too.