next up previous index
Next: Exercise Up: Fast Fourier Transform Previous: Fast Fourier Transform in

One Dimensional Parallel Fast Fourier Transform

Is there a way to parallelise one-dimensional Fast Fourier Transform?

Yes.

The way is as follows. Think of a one dimensional data vector as being a collapsed two-dimensional matrix, whose indexes (j, J), where $j \in [0, m]$ and $J \in [0, M]$ collapse onto f(Jm + j).

The 2-dimensional Discrete Fourier Transform of the matrix would be:

    $\displaystyle F(kM + K) = \sum_{j, J} e^{-2\pi i (kM + K)(Jm + j)/(Mm)} f(Jm + j)$  
    $\displaystyle \quad =
\sum_j\left(
e^{-2\pi i j k / m}
\left(
e^{-2\pi i j K/(Mm)}
\left(
\sum_J
e^{-2\pi i J K / M} f(Jm + j)
\right) \right) \right)$  

This yields the following prescription:
1.
Reshape the original 1-dimensional array into an $m\times M$ array in Fortran column-major order.
2.
Parallel FFT on the second index, i.e., columns.
3.
Multiply each component by a phase factor $\exp{\left(-2\pi i j K/(Mm)\right)}$.
4.
Transpose.
5.
Parallel FFT on the second index, i.e., still columns.
6.
Reshape the two-dimensional array into one-dimensional output.
There is a considerable amount of communication involved in this algorithm, so you may not save much if your communication fabric is slow.

Remember that even on the fastest communication fabrics, e.g., switched SMPs, inter-processor communication operations are at least an order of magnitude slower than local memory accesses. On slower fabrics, e.g., external switched communication networks, inter-processor communications may be 3 or more orders of magnitude slower than local memory access.

In turn, local memory operations may be still between one and two orders of magnitude slower than arithmetic CPU operations.

Moving data around is by far the costliest part of every computational process.


next up previous index
Next: Exercise Up: Fast Fourier Transform Previous: Fast Fourier Transform in
Zdzislaw Meglicki
2001-02-26