Next: Fast Fourier Transform in Up: Fast Fourier Transform Previous: A Recursive Implementation of

## The Danielson Lanczos Algorithm

The algorithm presented above can be rewritten in a more general way as follows:

 nFk = = = = (2.21)

where n/2Fke is an n/2-point DFT based on sampling even points of the original DFT and evaluated at harmonic k (observe that here we may have that k > n/2 - 1, but since is periodic in k with period n/2, we can always reduce k to something that's within the interval [0, n/2-1]), and n/2Fko is an n/2-point DFT based on sampling odd points of the original DFT and evaluated at harmonic k.

At this stage we can apply the same formula recursively to n/2Fke and to n/2Fko and reduce the problem to the evaluation of n/4Fkee, n/4Fkeo, n/4Fkoe, n/4Fkoo. This, in turn, can be reduced to the evaluation of n/8Fkeee, n/8Fkeeo, and so on, until n/8Fkooo (23 = 8 terms).

At the end of this process we're going to hit:

 (2.22)

where the m can be evaluated as follows:
• Reverse the pattern of es and os back to front.
• Replace every e with 0 and every o with 1.
• Reinterpret the obtained stream of 1s and 0s as a binary representation of a number. That number is our m.

Let us go back to our original example, where we had a look at 8F and evaluate, say, 8F5. According to our formula:

Observe that reversing the bits that correspond to numbers back to front will automatically generate the required pairs:

And this suggests the following computational strategy:
1.
Sort the data into bit-reversed order.
2.
Evaluate 4 2F transforms using pairs that result from that ordering.
3.
Evaluate 2 4F transforms using ordering obtained in the previous step.
4.
Evaluate 1 8F transform using ordering obtained in the previous step.

How to sort the data into bit-reversed order?

There is a Fortran function that will come handy. The function is

ISHFTC(I, Shift, Length)

and it works as follows: ISHFTC(I, 3, 17) will shift circularly 17 rightmost bits of I by 3 places to the left. If the Shift parameter is negative then the bits are shifted to the right.

How is this function going to help? Observe how we can bit-reverse a sequence by applying a series of righmost circular shifts with diminishing length to it:

Here is a Fortran code that does the same:
PROGRAM reverse

INTEGER i

DO i = 0, 7
WRITE(*,*) 'i = ', i, '  bit_reverse(i, 3) = ', bit_reverse(i, 3)
END DO

CONTAINS

FUNCTION bit_reverse(i, size)
INTEGER :: bit_reverse
INTEGER, INTENT(in) :: i, size

INTEGER :: length, temp

temp = i
DO length = size, 2, -1
temp = ISHFTC(temp, -1, length)
END DO
bit_reverse = temp

END FUNCTION bit_reverse

END PROGRAM reverse

And here is how it compiles and runs:
gustav@blanc:../src 20:25:45 !600 $f90 -o reverse reverse.f90 gustav@blanc:../src 20:38:44 !601$ ./reverse
i = 0  bit_reverse(i, 3) = 0
i = 1  bit_reverse(i, 3) = 4
i = 2  bit_reverse(i, 3) = 2
i = 3  bit_reverse(i, 3) = 6
i = 4  bit_reverse(i, 3) = 1
i = 5  bit_reverse(i, 3) = 5
i = 6  bit_reverse(i, 3) = 3
i = 7  bit_reverse(i, 3) = 7
gustav@blanc:../src 20:38:46 !602 $ A clever compiler should be able to recognise that the whole operation here can be performed in situ, i.e., on temp. If converted directly into appropriate hexadecimal instructions, it can be done very quickly. We're now ready to begin discussion of the Danielson-Lanczos code. The code comprises two parts. The first one rearranges the data using the reverse-bit ordering. The second part of the code builds up the multi-point Fourier Transform starting from single-point transforms. Here is the first part of the code, which is going to do Fast Fourier Transform in-situ:  SUBROUTINE fft(x) IMPLICIT NONE COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x INTEGER :: length, number_of_bits, i, j length = SIZE(x) number_of_bits = INT(LOG(REAL(length))/LOG(2.0)) IF (2**number_of_bits .NE. length) THEN WRITE(*,*) 'fft: error: input data length must be a power of 2' STOP 10 ELSE DO i = 1, length-2 j = bit_reverse(i, number_of_bits) IF (j .GT. i) CALL swap(x(i), x(j)) END DO END IF ...  Observe a particular trick I have exploited here: regardless of how the input data is indexed in the calling program, inside subroutine fft, the array is always numbered from 0 through length - 1, where length is the length of the array. This is accomplished by declaring x to be DIMENSION(0:). Also observe that I do not bother to bit_reverse i=0 and i=length-1, because these are guaranteed to be two obvious palindromes: 00...0 and 11...1. The Danielson-Lanczos part looks as follows, and both at first as well as at consecutive looks it is quite hard to understand:  mmax = 1 DO IF ( length .LE. mmax) EXIT istep = 2 * mmax theta = - pi/(mmax) wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8) w=CMPLX(1.0_8, 0.0_8, kind=8) DO m = 1, mmax ws=w DO i=m-1, length-1, istep j=i+mmax temp = ws*x(j) x(j) = x(i) - temp x(i) = x(i) + temp END DO w = w*wp + w END DO mmax = istep END DO  The easiest way to figure out how this works, is to instrument the code and make it print on standard output what it is doing. So I've done just that and here is what I got:  outer do loop: mmax = 1 istep = 2 middle do loop: m = 1 ws = (1.,0.E+0) inner do loop: i = 1 j = 2 inner do loop: i = 3 j = 4 inner do loop: i = 5 j = 6 inner do loop: i = 7 j = 8 outer do loop: mmax = 2 istep = 4 middle do loop: m = 1 ws = (1.,0.E+0) inner do loop: i = 1 j = 3 inner do loop: i = 5 j = 7 middle do loop: m = 2 ws = (2.22044604925031308E-16,-1.) inner do loop: i = 2 j = 4 inner do loop: i = 6 j = 8 outer do loop: mmax = 4 istep = 8 middle do loop: m = 1 ws = (1.,0.E+0) inner do loop: i = 1 j = 5 middle do loop: m = 2 ws = (0.70710678118654746,-0.70710678118654746) inner do loop: i = 2 j = 6 middle do loop: m = 3 ws = (0.E+0,-0.99999999999999978) inner do loop: i = 3 j = 7 middle do loop: m = 4 ws = (-0.70710678118654735,-0.70710678118654735) inner do loop: i = 4 j = 8  In this print-out I have shifted i and j up by 1 so that we can compare what comes out with equations (2.16), (2.17), and (2.18). So the first thing we notice is that we evaluate 4 2-point transforms, all with the same factor, which is 1. This corresponds to 2D1 = 1 In the next step we evaluate two 4-point transforms and this time there are two different factors, which are 1 and -i. Observe that this corresponds to 4D2, which is In the last step we evaluate one 4 point transform, and this time we have 4 different factors, which correspond to 8D4, which is: The factors are calculated here manually by exploiting a trigonometric recurrence on reals and then constructing corresponding floating point numbers. This is a very efficient way of doing thing, but an obscure one too. The code that implements the Danielson-Lanczos algorithm discussed here is a very old one. It was written years ago by N. Brenner of Lincoln Laboratories. As you compare it against our recursive code, you must appreciate the clarity of the recursive algorithm. Here is the full Danielson-Lanczos program: PROGRAM danielson_lanczos IMPLICIT NONE REAL(kind=8), PARAMETER :: pi = 3.141592653589793_8 INTEGER :: i COMPLEX(kind=8), DIMENSION(8) :: x, y x = (/ (i, i=0,7) /); x = 2.0_8 * pi / 8.0_8 * x; x = SIN(x) + COS(2 * x) & - SIN(3 * x) y = x CALL fft(y) y = 2.0_8 * pi / 8.0_8 * y WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') x WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') y STOP 0 CONTAINS FUNCTION bit_reverse(i, size) INTEGER :: bit_reverse INTEGER, INTENT(in) :: i, size INTEGER :: length, temp temp = i DO length = size, 2, -1 temp = ISHFTC(temp, -1, length) END DO bit_reverse = temp END FUNCTION bit_reverse SUBROUTINE swap (a, b) COMPLEX(kind=8) :: a, b COMPLEX(kind=8) :: hold hold = a a = b b = hold END SUBROUTINE swap SUBROUTINE fft(x) IMPLICIT NONE COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x DOUBLE PRECISION, PARAMETER :: pi = 3.141592653589793_8 COMPLEX(kind=8) :: wp, w, ws, temp REAL(kind=8) :: theta INTEGER :: length, number_of_bits, i, j, mmax, istep, m length = SIZE(x) number_of_bits = INT(LOG(REAL(length))/LOG(2.0)) IF (2**number_of_bits .NE. length) THEN WRITE(*,*) 'fft: error: input data length must be a power of 2' STOP 10 ELSE DO i = 1, length-2 j = bit_reverse(i, number_of_bits) IF (j .GT. i) CALL swap(x(i), x(j)) END DO END IF mmax = 1 DO IF ( length .LE. mmax) EXIT istep = 2 * mmax theta = - pi/(mmax) wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8) w=CMPLX(1.0_8, 0.0_8, kind=8) DO m = 1, mmax ws=w DO i=m-1, length-1, istep j=i+mmax temp = ws*x(j) x(j) = x(i) - temp x(i) = x(i) + temp END DO w = w*wp + w END DO mmax = istep END DO END SUBROUTINE fft END PROGRAM danielson_lanczos  And here is how this code compiles and runs: gustav@blanc:../src 00:10:38 !565$ f90 -o lanczos lanczos.f90
gustav@blanc:../src 00:10:53 !566 $./lanczos ( 1.000, 0.000) ( 0.000, 0.000) ( 1.000, 0.000) ( 0.000, 0.000) ( 1.000, 0.000) ( 0.000, 0.000) (-3.000, 0.000) ( 0.000, 0.000) ( 0.000, 0.000) ( 0.000, -3.142) ( 3.142, 0.000) ( 0.000, 3.142) ( 0.000, 0.000) ( 0.000, -3.142) ( 3.142, 0.000) ( 0.000, 3.142) STOP 0 gustav@blanc:../src 00:10:55 !567$


Next: Fast Fourier Transform in Up: Fast Fourier Transform Previous: A Recursive Implementation of
Zdzislaw Meglicki
2001-02-26