The algorithm presented above can be rewritten in a more general
way as follows:
| nFk | = | ![]() |
|
| = | ![]() |
||
| = | ![]() |
||
| = | (2.21) |
At this stage we can apply the same formula recursively to n/2Fke and to n/2Fko and reduce the problem to the evaluation of n/4Fkee, n/4Fkeo, n/4Fkoe, n/4Fkoo. This, in turn, can be reduced to the evaluation of n/8Fkeee, n/8Fkeeo, and so on, until n/8Fkooo (23 = 8 terms).
At the end of this process we're going to hit:
| (2.22) |
Let us go back to our original example, where we had a look
at
8F and evaluate, say,
8F5. According to our formula:
Observe that reversing the bits that correspond to
numbers
back to front will automatically
generate the required pairs:
How to sort the data into bit-reversed order?
There is a Fortran function that will come handy. The function is
ISHFTC(I, Shift, Length)and it works as follows:
ISHFTC(I, 3, 17) will shift circularly 17 rightmost bits
of I by 3 places to the left. If the Shift parameter
is negative then the bits are shifted to the right.
How is this function going to help? Observe how we can bit-reverse
a sequence by applying a series of righmost circular shifts with
diminishing length to it:
PROGRAM reverse
INTEGER i
DO i = 0, 7
WRITE(*,*) 'i = ', i, ' bit_reverse(i, 3) = ', bit_reverse(i, 3)
END DO
CONTAINS
FUNCTION bit_reverse(i, size)
INTEGER :: bit_reverse
INTEGER, INTENT(in) :: i, size
INTEGER :: length, temp
temp = i
DO length = size, 2, -1
temp = ISHFTC(temp, -1, length)
END DO
bit_reverse = temp
END FUNCTION bit_reverse
END PROGRAM reverse
And here is how it compiles and runs:gustav@blanc:../src 20:25:45 !600 $ f90 -o reverse reverse.f90 gustav@blanc:../src 20:38:44 !601 $ ./reverse i = 0 bit_reverse(i, 3) = 0 i = 1 bit_reverse(i, 3) = 4 i = 2 bit_reverse(i, 3) = 2 i = 3 bit_reverse(i, 3) = 6 i = 4 bit_reverse(i, 3) = 1 i = 5 bit_reverse(i, 3) = 5 i = 6 bit_reverse(i, 3) = 3 i = 7 bit_reverse(i, 3) = 7 gustav@blanc:../src 20:38:46 !602 $
A clever compiler should be able to recognise that the whole operation
here can be performed in situ, i.e., on temp. If converted
directly into appropriate hexadecimal instructions, it can be done
very quickly.
We're now ready to begin discussion of the Danielson-Lanczos code.
The code comprises two parts. The first one rearranges the data using the reverse-bit ordering. The second part of the code builds up the multi-point Fourier Transform starting from single-point transforms.
Here is the first part of the code, which is going to do Fast Fourier Transform in-situ:
SUBROUTINE fft(x)
IMPLICIT NONE
COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x
INTEGER :: length, number_of_bits, i, j
length = SIZE(x)
number_of_bits = INT(LOG(REAL(length))/LOG(2.0))
IF (2**number_of_bits .NE. length) THEN
WRITE(*,*) 'fft: error: input data length must be a power of 2'
STOP 10
ELSE
DO i = 1, length-2
j = bit_reverse(i, number_of_bits)
IF (j .GT. i) CALL swap(x(i), x(j))
END DO
END IF
...
Observe a particular trick I have exploited here: regardless of
how the input data is indexed in the calling program, inside
subroutine fft, the array is always numbered from 0
through length - 1, where length is the length
of the array. This is accomplished by declaring x
to be DIMENSION(0:). Also observe that I do not bother
to bit_reverse i=0 and i=length-1, because
these are guaranteed to be two obvious palindromes: 00...0 and
11...1.
The Danielson-Lanczos part looks as follows, and both at first as well as at consecutive looks it is quite hard to understand:
mmax = 1
DO
IF ( length .LE. mmax) EXIT
istep = 2 * mmax
theta = - pi/(mmax)
wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8)
w=CMPLX(1.0_8, 0.0_8, kind=8)
DO m = 1, mmax
ws=w
DO i=m-1, length-1, istep
j=i+mmax
temp = ws*x(j)
x(j) = x(i) - temp
x(i) = x(i) + temp
END DO
w = w*wp + w
END DO
mmax = istep
END DO
The easiest way to figure out how this works, is to instrument the code and make it print on standard output what it is doing. So I've done just that and here is what I got:
outer do loop: mmax = 1 istep = 2
middle do loop: m = 1 ws = (1.,0.E+0)
inner do loop: i = 1 j = 2
inner do loop: i = 3 j = 4
inner do loop: i = 5 j = 6
inner do loop: i = 7 j = 8
outer do loop: mmax = 2 istep = 4
middle do loop: m = 1 ws = (1.,0.E+0)
inner do loop: i = 1 j = 3
inner do loop: i = 5 j = 7
middle do loop: m = 2 ws = (2.22044604925031308E-16,-1.)
inner do loop: i = 2 j = 4
inner do loop: i = 6 j = 8
outer do loop: mmax = 4 istep = 8
middle do loop: m = 1 ws = (1.,0.E+0)
inner do loop: i = 1 j = 5
middle do loop: m = 2 ws = (0.70710678118654746,-0.70710678118654746)
inner do loop: i = 2 j = 6
middle do loop: m = 3 ws = (0.E+0,-0.99999999999999978)
inner do loop: i = 3 j = 7
middle do loop: m = 4 ws = (-0.70710678118654735,-0.70710678118654735)
inner do loop: i = 4 j = 8
In this print-out I have shifted i and j up by 1 so that we can compare
what comes out with equations (2.16),
(2.17), and (2.18).
So the first thing we notice is that we evaluate 4 2-point transforms,
all with the same
factor, which is 1. This corresponds
to
2D1 = 1
In the next step we evaluate two 4-point transforms and this time there
are two different
factors, which are 1 and -i. Observe that
this corresponds to
4D2, which is
In the last step we evaluate one 4 point transform, and this time we
have 4 different
factors, which correspond to
8D4, which is:
The code that implements the Danielson-Lanczos algorithm discussed here is a very old one. It was written years ago by N. Brenner of Lincoln Laboratories.
As you compare it against our recursive code, you must appreciate the clarity of the recursive algorithm.
Here is the full Danielson-Lanczos program:
PROGRAM danielson_lanczos
IMPLICIT NONE
REAL(kind=8), PARAMETER :: pi = 3.141592653589793_8
INTEGER :: i
COMPLEX(kind=8), DIMENSION(8) :: x, y
x = (/ (i, i=0,7) /); x = 2.0_8 * pi / 8.0_8 * x; x = SIN(x) + COS(2 * x) &
- SIN(3 * x)
y = x
CALL fft(y)
y = 2.0_8 * pi / 8.0_8 * y
WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') x
WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') y
STOP 0
CONTAINS
FUNCTION bit_reverse(i, size)
INTEGER :: bit_reverse
INTEGER, INTENT(in) :: i, size
INTEGER :: length, temp
temp = i
DO length = size, 2, -1
temp = ISHFTC(temp, -1, length)
END DO
bit_reverse = temp
END FUNCTION bit_reverse
SUBROUTINE swap (a, b)
COMPLEX(kind=8) :: a, b
COMPLEX(kind=8) :: hold
hold = a
a = b
b = hold
END SUBROUTINE swap
SUBROUTINE fft(x)
IMPLICIT NONE
COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x
DOUBLE PRECISION, PARAMETER :: pi = 3.141592653589793_8
COMPLEX(kind=8) :: wp, w, ws, temp
REAL(kind=8) :: theta
INTEGER :: length, number_of_bits, i, j, mmax, istep, m
length = SIZE(x)
number_of_bits = INT(LOG(REAL(length))/LOG(2.0))
IF (2**number_of_bits .NE. length) THEN
WRITE(*,*) 'fft: error: input data length must be a power of 2'
STOP 10
ELSE
DO i = 1, length-2
j = bit_reverse(i, number_of_bits)
IF (j .GT. i) CALL swap(x(i), x(j))
END DO
END IF
mmax = 1
DO
IF ( length .LE. mmax) EXIT
istep = 2 * mmax
theta = - pi/(mmax)
wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8)
w=CMPLX(1.0_8, 0.0_8, kind=8)
DO m = 1, mmax
ws=w
DO i=m-1, length-1, istep
j=i+mmax
temp = ws*x(j)
x(j) = x(i) - temp
x(i) = x(i) + temp
END DO
w = w*wp + w
END DO
mmax = istep
END DO
END SUBROUTINE fft
END PROGRAM danielson_lanczos
And here is how this code compiles and runs:gustav@blanc:../src 00:10:38 !565 $ f90 -o lanczos lanczos.f90 gustav@blanc:../src 00:10:53 !566 $ ./lanczos ( 1.000, 0.000) ( 0.000, 0.000) ( 1.000, 0.000) ( 0.000, 0.000) ( 1.000, 0.000) ( 0.000, 0.000) (-3.000, 0.000) ( 0.000, 0.000) ( 0.000, 0.000) ( 0.000, -3.142) ( 3.142, 0.000) ( 0.000, 3.142) ( 0.000, 0.000) ( 0.000, -3.142) ( 3.142, 0.000) ( 0.000, 3.142) STOP 0 gustav@blanc:../src 00:10:55 !567 $