High-Performance FFT Using Data Reorganization 


Vol. 12,  No. 3, pp. 215-222, Jun.  2005
10.3745/KIPSTA.2005.12.3.215


PDF
  Abstract

The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, AthlonTM 64, Alpha 21264, UltraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.

  Statistics


  Cite this article

[IEEE Style]

N. S. Park and Y. H. Choi, "High-Performance FFT Using Data Reorganization," The KIPS Transactions:PartA, vol. 12, no. 3, pp. 215-222, 2005. DOI: 10.3745/KIPSTA.2005.12.3.215.

[ACM Style]

Neung Soo Park and Yung Ho Choi. 2005. High-Performance FFT Using Data Reorganization. The KIPS Transactions:PartA, 12, 3, (2005), 215-222. DOI: 10.3745/KIPSTA.2005.12.3.215.