next up previous index
Next: Fast Fourier Transform in Up: Fast Fourier Transform Previous: A Recursive Implementation of

The Danielson Lanczos Algorithm

The algorithm presented above can be rewritten in a more general way as follows:

nFk = $\displaystyle \sum_{j=0}^{n-1} e^{-2\pi i k j / n} f_j$  
  = $\displaystyle \sum_{j=0}^{n/2 - 1} e^{-2\pi i k (2j) / n} f_{2j}
+ \sum_{j=0}^{n/2 - 1} e^{-2\pi i k (2j + 1) / n} f_{2j + 1}$  
  = $\displaystyle \sum_{j=0}^{n/2 - 1} e^{-2\pi i k j / (n/2)} f_{2j}
+ \omega_n^k
\sum_{j=0}^{n/2 - 1} e^{-2\pi i k j / (n/2)} f_{2j+1}$  
  = $\displaystyle {}^{n/2}F_k^e + \omega_n^k \cdot {}^{n/2}F_k^o$ (2.21)

where n/2Fke is an n/2-point DFT based on sampling even points of the original DFT and evaluated at harmonic k (observe that here we may have that k > n/2 - 1, but since $e^{-2\pi i k / (n/2)}$is periodic in k with period n/2, we can always reduce k to something that's within the interval [0, n/2-1]), and n/2Fko is an n/2-point DFT based on sampling odd points of the original DFT and evaluated at harmonic k.

At this stage we can apply the same formula recursively to n/2Fke and to n/2Fko and reduce the problem to the evaluation of n/4Fkee, n/4Fkeo, n/4Fkoe, n/4Fkoo. This, in turn, can be reduced to the evaluation of n/8Fkeee, n/8Fkeeo, and so on, until n/8Fkooo (23 = 8 terms).

At the end of this process we're going to hit:

\begin{displaymath}{}^{n/n}F_k^{oooeee\ldots eo} = f_m
\end{displaymath} (2.22)

where the m can be evaluated as follows:

Let us go back to our original example, where we had a look at 8F and evaluate, say, 8F5. According to our formula:

\begin{eqnarray*}{}^8F_5 &=& {}^4F_5^e + \omega^5_8 \cdot {}^4F_5^o \\
&=& \le...
...cdot \left(
f_{3} + \omega^5_2\cdot f_{7}
\right)
\right) \\
\end{eqnarray*}


Observe that reversing the bits that correspond to numbers $1, 2, 3, \ldots, n$ back to front will automatically generate the required pairs:

\begin{displaymath}\begin{array}{ccccccc}
0 & \rightarrow & 000 & \rightarrow ...
...ghtarrow & 111 & \rightarrow 111 & \rightarrow & 7
\end{array}\end{displaymath}

And this suggests the following computational strategy:
1.
Sort the data into bit-reversed order.
2.
Evaluate 4 2F transforms using pairs that result from that ordering.
3.
Evaluate 2 4F transforms using ordering obtained in the previous step.
4.
Evaluate 1 8F transform using ordering obtained in the previous step.

How to sort the data into bit-reversed order?

There is a Fortran function that will come handy. The function is

ISHFTC(I, Shift, Length)
and it works as follows: ISHFTC(I, 3, 17) will shift circularly 17 rightmost bits of I by 3 places to the left. If the Shift parameter is negative then the bits are shifted to the right.

How is this function going to help? Observe how we can bit-reverse a sequence by applying a series of righmost circular shifts with diminishing length to it:

\begin{eqnarray*}&& [a, b, c, d, e, f, g, h] \\
\hbox{\tt ishftc(i, -1, 8)} &\r...
...box{\tt ishftc(i, -1, 2)} &\rightarrow& [h, g, f, e, d, c, b, a]
\end{eqnarray*}


Here is a Fortran code that does the same:
PROGRAM reverse

  INTEGER i

  DO i = 0, 7
     WRITE(*,*) 'i = ', i, '  bit_reverse(i, 3) = ', bit_reverse(i, 3)
  END DO

CONTAINS

  FUNCTION bit_reverse(i, size)
    INTEGER :: bit_reverse
    INTEGER, INTENT(in) :: i, size

    INTEGER :: length, temp

    temp = i
    DO length = size, 2, -1
       temp = ISHFTC(temp, -1, length)
    END DO
    bit_reverse = temp

  END FUNCTION bit_reverse

END PROGRAM reverse
And here is how it compiles and runs:
gustav@blanc:../src 20:25:45 !600 $ f90 -o reverse reverse.f90
gustav@blanc:../src 20:38:44 !601 $ ./reverse
 i = 0  bit_reverse(i, 3) = 0
 i = 1  bit_reverse(i, 3) = 4
 i = 2  bit_reverse(i, 3) = 2
 i = 3  bit_reverse(i, 3) = 6
 i = 4  bit_reverse(i, 3) = 1
 i = 5  bit_reverse(i, 3) = 5
 i = 6  bit_reverse(i, 3) = 3
 i = 7  bit_reverse(i, 3) = 7
gustav@blanc:../src 20:38:46 !602 $

A clever compiler should be able to recognise that the whole operation here can be performed in situ, i.e., on temp. If converted directly into appropriate hexadecimal instructions, it can be done very quickly.

We're now ready to begin discussion of the Danielson-Lanczos code.

The code comprises two parts. The first one rearranges the data using the reverse-bit ordering. The second part of the code builds up the multi-point Fourier Transform starting from single-point transforms.

Here is the first part of the code, which is going to do Fast Fourier Transform in-situ:

  SUBROUTINE fft(x)
    IMPLICIT NONE

    COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x

    INTEGER :: length, number_of_bits, i, j

    length = SIZE(x)
    number_of_bits = INT(LOG(REAL(length))/LOG(2.0))
    IF (2**number_of_bits .NE. length) THEN
       WRITE(*,*) 'fft: error: input data length must be a power of 2'
       STOP 10
    ELSE
       DO i = 1, length-2
          j = bit_reverse(i, number_of_bits)
          IF (j .GT. i) CALL swap(x(i), x(j))
       END DO
    END IF
    ...
Observe a particular trick I have exploited here: regardless of how the input data is indexed in the calling program, inside subroutine fft, the array is always numbered from 0 through length - 1, where length is the length of the array. This is accomplished by declaring x to be DIMENSION(0:). Also observe that I do not bother to bit_reverse i=0 and i=length-1, because these are guaranteed to be two obvious palindromes: 00...0 and 11...1.

The Danielson-Lanczos part looks as follows, and both at first as well as at consecutive looks it is quite hard to understand:

    mmax = 1
    DO
       IF ( length .LE. mmax) EXIT
       istep = 2 * mmax
       theta = - pi/(mmax)
       wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8)
       w=CMPLX(1.0_8, 0.0_8, kind=8)
       DO m = 1, mmax
          ws=w
          DO i=m-1, length-1, istep
             j=i+mmax
             temp = ws*x(j)
             x(j) = x(i) - temp
             x(i) = x(i) + temp
          END DO
          w = w*wp + w
       END DO
       mmax = istep
    END DO

The easiest way to figure out how this works, is to instrument the code and make it print on standard output what it is doing. So I've done just that and here is what I got:

 outer do loop: mmax = 1 istep = 2
    middle do loop: m = 1 ws = (1.,0.E+0)
       inner do loop: i = 1 j = 2
       inner do loop: i = 3 j = 4
       inner do loop: i = 5 j = 6
       inner do loop: i = 7 j = 8
 outer do loop: mmax = 2 istep = 4
    middle do loop: m = 1 ws = (1.,0.E+0)
       inner do loop: i = 1 j = 3
       inner do loop: i = 5 j = 7
    middle do loop: m = 2 ws = (2.22044604925031308E-16,-1.)
       inner do loop: i = 2 j = 4
       inner do loop: i = 6 j = 8
 outer do loop: mmax = 4 istep = 8
    middle do loop: m = 1 ws = (1.,0.E+0)
       inner do loop: i = 1 j = 5
    middle do loop: m = 2 ws = (0.70710678118654746,-0.70710678118654746)
       inner do loop: i = 2 j = 6
    middle do loop: m = 3 ws = (0.E+0,-0.99999999999999978)
       inner do loop: i = 3 j = 7
    middle do loop: m = 4 ws = (-0.70710678118654735,-0.70710678118654735)
       inner do loop: i = 4 j = 8
In this print-out I have shifted i and j up by 1 so that we can compare what comes out with equations (2.16), (2.17), and (2.18).

So the first thing we notice is that we evaluate 4 2-point transforms, all with the same $\omega$ factor, which is 1. This corresponds to 2D1 = 1

In the next step we evaluate two 4-point transforms and this time there are two different $\omega$ factors, which are 1 and -i. Observe that this corresponds to 4D2, which is

\begin{displaymath}\left(
\begin{array}{cc}
1 & 0 \\
0 & \omega_4
\end{arr...
...\left(\begin{array}{cc}
1 & 0 \\
0 & -i
\end{array}\right)
\end{displaymath}

In the last step we evaluate one 4 point transform, and this time we have 4 different $\omega$ factors, which correspond to 8D4, which is:

\begin{displaymath}\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & \omega_8 & ...
...& -\frac{1}{\sqrt{2}} - \frac{i}{\sqrt{2}}
\end{array}\right)
\end{displaymath}

The $\omega$ factors are calculated here manually by exploiting a trigonometric recurrence on reals and then constructing corresponding floating point numbers. This is a very efficient way of doing thing, but an obscure one too.

The code that implements the Danielson-Lanczos algorithm discussed here is a very old one. It was written years ago by N. Brenner of Lincoln Laboratories.

As you compare it against our recursive code, you must appreciate the clarity of the recursive algorithm.

Here is the full Danielson-Lanczos program:

PROGRAM danielson_lanczos

  IMPLICIT NONE

  REAL(kind=8), PARAMETER :: pi = 3.141592653589793_8
  INTEGER :: i
  COMPLEX(kind=8), DIMENSION(8) :: x, y

  x = (/ (i, i=0,7) /); x = 2.0_8 * pi / 8.0_8 * x; x = SIN(x) + COS(2 * x) &
      - SIN(3 * x)
  y = x
  CALL fft(y)
  y = 2.0_8 * pi / 8.0_8 * y
  WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') x
  WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') y
  STOP 0

CONTAINS

  FUNCTION bit_reverse(i, size)
    INTEGER :: bit_reverse
    INTEGER, INTENT(in) :: i, size

    INTEGER :: length, temp

    temp = i
    DO length = size, 2, -1
       temp = ISHFTC(temp, -1, length)
    END DO
    bit_reverse = temp

  END FUNCTION bit_reverse

  SUBROUTINE swap (a, b)
    COMPLEX(kind=8) :: a, b
    COMPLEX(kind=8) :: hold

    hold = a
    a = b
    b = hold

  END SUBROUTINE swap

  SUBROUTINE fft(x)

    IMPLICIT NONE

    COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x

    DOUBLE PRECISION, PARAMETER :: pi = 3.141592653589793_8
    COMPLEX(kind=8) :: wp, w, ws, temp
    REAL(kind=8) :: theta
    INTEGER :: length, number_of_bits, i, j, mmax, istep, m

    length = SIZE(x)
    number_of_bits = INT(LOG(REAL(length))/LOG(2.0))
    IF (2**number_of_bits .NE. length) THEN
       WRITE(*,*) 'fft: error: input data length must be a power of 2'
       STOP 10
    ELSE
       DO i = 1, length-2
          j = bit_reverse(i, number_of_bits)
          IF (j .GT. i) CALL swap(x(i), x(j))
       END DO
    END IF

    mmax = 1
    DO
       IF ( length .LE. mmax) EXIT
       istep = 2 * mmax
       theta = - pi/(mmax)
       wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8)
       w=CMPLX(1.0_8, 0.0_8, kind=8)
       DO m = 1, mmax
          ws=w
          DO i=m-1, length-1, istep
             j=i+mmax
             temp = ws*x(j)
             x(j) = x(i) - temp
             x(i) = x(i) + temp
          END DO
          w = w*wp + w
       END DO
       mmax = istep
    END DO

  END SUBROUTINE fft

END PROGRAM danielson_lanczos
And here is how this code compiles and runs:
gustav@blanc:../src 00:10:38 !565 $ f90 -o lanczos lanczos.f90
gustav@blanc:../src 00:10:53 !566 $ ./lanczos
( 1.000,  0.000)
( 0.000,  0.000)
( 1.000,  0.000)
( 0.000,  0.000)
( 1.000,  0.000)
( 0.000,  0.000)
(-3.000,  0.000)
( 0.000,  0.000)

( 0.000,  0.000)
( 0.000, -3.142)
( 3.142,  0.000)
( 0.000,  3.142)
( 0.000,  0.000)
( 0.000, -3.142)
( 3.142,  0.000)
( 0.000,  3.142)

 STOP 0
gustav@blanc:../src 00:10:55 !567 $


next up previous index
Next: Fast Fourier Transform in Up: Fast Fourier Transform Previous: A Recursive Implementation of
Zdzislaw Meglicki
2001-02-26