next up previous index
Next: The Householder-QR/QL Algorithm Up: Jacobi Transformations of a Previous: About Jacobi

Example Code

SUBROUTINE jacobi(a, d, v, nrot)

  USE nrtype; USE nrutil, ONLY : assert_eq, get_diag, nrerror, unit_matrix, &
       upper_triangle

  IMPLICIT NONE
  INTEGER(i4b), INTENT(out) :: nrot
  REAL(sp), DIMENSION(:), INTENT(out) :: d
  REAL(sp), DIMENSION(:,:), INTENT(inout) :: a
  REAL(sp), DIMENSION(:,:), INTENT(out) :: v

  INTEGER(i4b) :: i, ip, iq, n
  REAL(sp) :: c, g, h, s, sm, t, tau, theta, tresh
  REAL(sp), DIMENSION(SIZE(d)) :: b, z

  n = assert_eq((/SIZE(a, 1), SIZE(a, 2), SIZE(d), SIZE(v, 1), SIZE(v, 2)/), &
       'jacobi')
  CALL unit_matrix(v(:,:))
  b(:) = get_diag(a(:,:))
  d(:) = b(:)
  z(:) = 0.0
  nrot = 0
  DO i = 1, 50
     sm = SUM(ABS(a), mask=upper_triangle(n,n))
     IF(sm == 0.0) RETURN
     tresh = MERGE(0.2_sp*sm/n**2, 0.0_sp, i < 4)
     DO ip=1, n-1
        DO iq=ip+1, n
           g=100.0_sp*ABS(a(ip,iq))
           IF((i > 4) .AND. (ABS(d(ip))+g == ABS(d(ip))) &
                .AND. (ABS(d(iq))+g == ABS(d(iq)))) THEN
              a(ip,iq)=0.0
           ELSE IF (ABS(a(ip,iq)) > tresh) THEN
              h = d(iq)-d(ip)
              IF(ABS(h)+g == ABS(h)) THEN
                 t = a(ip,iq)/h
              ELSE
                 theta = 0.5_sp*h/a(ip,iq)
                 t = 1.0_sp/(ABS(theta)+SQRT(1.0_sp+theta**2))
                 IF(theta < 0.0) t = -t
              END IF
              c = 1.0_sp / SQRT(1+t**2)
              s = t*c
              tau = s/(1.0_sp + c)
              h = t*a(ip, iq)
              z(ip) = z(ip) - h
              z(iq) = z(iq) + h
              d(ip) = d(ip) - h
              d(iq) = d(iq) + h
              a(ip, iq) = 0.0
              CALL jrotate(a(1:ip-1, ip), a(1:ip-1, iq))
              CALL jrotate(a(ip, ip+1:iq-1), a(ip+1:iq-1, iq))
              CALL jrotate(a(ip, iq+1:n), a(iq, iq+1:n))
              CALL jrotate(v(:,ip), v(:,iq))
              nrot=nrot+1
           END IF
        END DO
     END DO
     b(:) = b(:) + z(:)
     d(:) = b(:)
     z(:) = 0.0
  END DO
  CALL nrerror('too many iterations in jacobi')

CONTAINS

  SUBROUTINE jrotate(a1, a2)
    REAL(sp), DIMENSION(:), INTENT(inout) :: a1, a2
    REAL(sp), DIMENSION(SIZE(a1)) : wk1
    wk1(:) = a1(:)
    a1(:) = a1(:) - s*(a2(:) + a1(:) * tau)
    a2(:) = a2(:) + s*(wk1(:) - a2(:) * tau)
  END SUBROUTINE jrotate

END SUBROUTINE jacobi

Let me explain this code now in some detail.

The subroutine computes all eigenvalues and eigenvectors of an $N\times N$ matrix a. The elements of a above the diagonal are destroyed in the process. The eigenvalues are returned on a vector of length N: d, whereas the eigenvectors are returned on v, which a matrix $N\times N$, like a. The total number of Jacobi rotations is returned on nrot.

The subroutine begins with a call to function assert_eq, which is a still to be defined function, whose purpose is to check that a is indeed a square matrix and that the sizes of d and v match a. If these conditions are not met, the function exits with error message that contains the word 'jacobi', otherwise the common dimension Nis returned and captured on n:

  n = assert_eq((/SIZE(a, 1), SIZE(a, 2), SIZE(d), SIZE(v, 1), SIZE(v, 2)/), &
       'jacobi')

Then we initialize v to a unit matrix, extract the diagonal from a and copy it both on an auxiliary array b and on d, which will ultimately return the eigenvalues. Another auxiliary array, z is set to zero. The subroutine could be written without b and z - their use obscures the flow of computation a little, and, I think that it is here only for the sake of improving the numerical accuracy of the computational process, which contains a lot of additions and subtractions:

  CALL unit_matrix(v(:,:))
  b(:) = get_diag(a(:,:))
  d(:) = b(:)
  z(:) = 0.0

Now we initialize our Jacobi rotation counter to zero and commence Jacobi sweeps. The procedure should normally converge in no more than about 5 sweeps. If it takes more, something must be really wrong. This main loop looks as follows:

  nrot = 0
  DO i = 1, 50
     sm = SUM(ABS(a), mask=upper_triangle(n,n))
     IF(sm == 0.0) RETURN
     tresh = MERGE(0.2_sp*sm/n**2, 0.0_sp, i < 4)
     DO ip=1, n-1
        DO iq=ip+1, n
           blah... blah... blah...
        END DO
     END DO
     b(:) = b(:) + z(:)
     d(:) = b(:)
     z(:) = 0.0
  END DO
  CALL nrerror('too many iterations in jacobi')
At the very beginning we sum absolute values of all terms in the upper triangle of matrix A (or a) and if that sum ends up being 0 up to machine accuracy, we return:

\begin{displaymath}\hbox{{\tt sm}} = \sum_{p = 1}^{n-1} \sum_{q = p+1}^{n} \vert A_{pq}\vert
\end{displaymath}

No special cleaning is necessary at this stage, because all expected results are already in place.

The parameter tresh is calculated by calling a Fortran intrinsic MERGE, which returns:

The purpose of this parameter is to annihilate only those terms in the upper triangular of A that stick out in the first three sweeps, and only then commence work on the whole upper triangular.

We skip the discussion of what's inside the double loop that sweeps through the upper triangular of A and go right to what happens after the sweep is done.

What you see there is what those two auxiliary vectors b and z are for. Vector z accumulates t apq with an appropriate sign (- for pp and + for qq) for every diagonal term that is stored on b. All those accumulated contributions are now added to the diagonal, and the new updated diagonal is transferred to d. Then vector z is cleared and made ready for the new sweep.

You will see that inside the sweep loop the diagonal, which is stored on d is updated all the time too, in a similar way to z, so, in principle, d should contain the same numbers as b(:) + z(:), why then do d(:) = b(:)? It is here, I think, that the designers of the code tried to minimize rounding errors that would accumulate on d(:) in the process.

If for some reason we've done 50 iterations and the process still hasn't converged, we exit the routine with an error message.

Now let us have a look at what's inside the double loop that sweeps through the upper triangular of A.

We begin by testing how large is the element Apq, which is to be annihilated. So first we simply take $g = 100 \vert A_{pq}\vert$, and then we execute this elaborate IF statement:

           g=100.0_sp*ABS(a(ip,iq))
           IF((i > 4) .AND. (ABS(d(ip))+g == ABS(d(ip))) &
                .AND. (ABS(d(iq))+g == ABS(d(iq)))) THEN
              a(ip,iq)=0.0
           ELSE IF (ABS(a(ip,iq)) > tresh) THEN
              blah... blah... blah...
           END IF
The first clause of IF test for the following condition: How can $A_{qq} + 100 \vert A_{pq}\vert$ be equal to Aqq? It can be, if $100 \vert A_{pq}\vert$ is so much smaller than Aqq that the value of Aqq does not change to within the machine accuracy. In other words what we're saying here is that if the element to be annihilated is so small that the Jacobi rotation will not change the diagonal elements, then don't bother with the rotation at all: just make that off-diagonal element zero.

Otherwise, i.e., if the off-diagonal element needs to be rotated away, then check if the element Apq is greater than the threashold we have evaluated earlier.

That threashold was different from zero only for the first three sweeps, and its value was a kind of an average value for the upper triangular: $\frac{1}{5}\frac{1}{N^2}\sum_{p = 1}^{n-1}\sum_{q = p+1}^{n}\vert A_{pq}\vert$. So it is here that we implement Jacobi rotation for the off-diagonal elements that stand out during the first three sweeps.

By now we've done enough checking, and enough avoiding, and finally we have to get down to work and perform the actual rotation.

The first step is to evaluate

\begin{displaymath}t = \tan{\varphi} = \frac{\hbox{sign}(\theta)}
{\vert\theta\vert + \sqrt{\theta^2 + 1}}
\approx \frac{1}{2\theta}
\end{displaymath}

where the approximation holds for very large $\theta$, and where

\begin{displaymath}\theta = \cot{2\varphi} = \frac{c^2 - s^2}{2sc} = \frac{a_{qq} - a_{pp}}
{2a_{pq}}
\end{displaymath}

Remember that the diagonal terms aqq and app live in d:
              h = d(iq)-d(ip)
              IF(ABS(h)+g == ABS(h)) THEN
                 t = a(ip,iq)/h
              ELSE
                 theta = 0.5_sp*h/a(ip,iq)
                 t = 1.0_sp/(ABS(theta)+SQRT(1.0_sp+theta**2))
                 IF(theta < 0.0) t = -t
              END IF

Now, once we have t, we evaluate c, s, and $\tau$:

\begin{eqnarray*}c &=& \cos{\varphi} = \frac{1}{\sqrt{t^2 + 1}} \\
s &=& \sin{\...
...2}} = \frac{\sin{\varphi}}{1 + \cos{\varphi}}
= \frac{s}{1 + c}
\end{eqnarray*}


              c = 1.0_sp / SQRT(1+t**2)
              s = t*c
              tau = s/(1.0_sp + c)

The next step is to rotate the diagonal elements, because they are easy to do:

\begin{eqnarray*}a'_{pp} &=& a_{pp} - t a_{pq} \\
a'_{qq} &=& a_{qq} + t a_{pq}
\end{eqnarray*}


              h = t*a(ip, iq)
              z(ip) = z(ip) - h
              z(iq) = z(iq) + h
              d(ip) = d(ip) - h
              d(iq) = d(iq) + h
Observe that it is here that we collect the total change to aii per sweep on z, whereas the diagonal terms themselves are updated for every sweep on d.

At this stage we won't need apq any more, because it is no longer used explicitly in rotating arp and arq, so we annihilate it:

              a(ip, iq) = 0.0

Finally we rotate arp and arq:

\begin{eqnarray*}a'_{rp} &=& a_{rp} - s(a_{rq} + \tau a_{rp}) \\
a'_{rq} &=& a_{rq} + s(a_{rp} - \tau a_{rq})
\end{eqnarray*}


by calling an auxiliary subroutine jrotate:
              CALL jrotate(a(1:ip-1, ip), a(1:ip-1, iq))
              CALL jrotate(a(ip, ip+1:iq-1), a(ip+1:iq-1, iq))
              CALL jrotate(a(ip, iq+1:n), a(iq, iq+1:n))
These three call correspond to three subsets of A that need to be rotated separately, so that we don't hit the point (p,q) itself.

The first call rotates the two vertical lines: one that stretches from (p,q) upwards, and the other one, parallel to it and slightly to the left, that stretches from the diagonal upwards. Both are of equal length.

The second call rotates the two perpendicular lines between the point (p,q) and the diagonal: one is horizontal and the other is vertical. Again, they are of equal length.

Finall, the third call rotates the two horizontal lines: one that stretches from (p,q) to the right boundary, and the other one slightly below, that stretches from the right boundary to the diagonal. Both are of equal length.

Having rotated arp and arq we rotate the matrix of eigenvectors v. Remember that they are rotated in a much the same way as arp and arq, i.e.,

\begin{eqnarray*}v'_{rp} &=& v_{rp} - s(v_{rq} + \tau v_{rp}) \\
v'_{rq} &=& v_{rq} + s(v_{rp} - \tau v_{rq})
\end{eqnarray*}


This is implemented simply by calling:
              CALL jrotate(v(:,ip), v(:,iq))

After we have rotated matrices A and V, we increment the Jacobi rotation counter nrot:

              nrot=nrot+1

The auxiliary subroutine jrotate implements the equations:

\begin{eqnarray*}a'_{rp} &=& a_{rp} - s(a_{rq} + \tau a_{rp}) \\
a'_{rq} &=& a_{rq} + s(a_{rp} - \tau a_{rq})
\end{eqnarray*}


Observe one small complication. Because the old arp is needed by the second equation we must save it before we alter it with the first equation. Subroutine jrotate performs the computation in parallel as follows:
    wk1(:) = a1(:)
    a1(:) = a1(:) - s*(a2(:) + a1(:) * tau)
    a2(:) = a2(:) + s*(wk1(:) - a2(:) * tau)

Observe that the parallelism is both in the horizontal and in the vertical direction of matrix A. There is a fair amount of communication involved too. This algorithm will run well on a vector processor or on an SMP, but not so well on an MPP.

In any case, the code presented in this section shows all parallel operations explicitly and clearly. It is now up to the compiler to make use of it.

Unfortunately, there are no machines on the market at present that can operate as effectively on stridden data as they can work with adjacent data, i.e., with non-stridden vectors. For this computer architectures would have to undergo a dramatic modifications: the memory would have to become multidimensional and the processor architecture would have to become multidimensional too.


next up previous index
Next: The Householder-QR/QL Algorithm Up: Jacobi Transformations of a Previous: About Jacobi
Zdzislaw Meglicki
2001-02-26