Embedded DRAM-Based Memory Customization for Low-Cost FFT Processor Design

In this paper, we present embedded dynamic random access memory (eDRAM)-based memory customization techniques for low-cost fast Fourier transform (FFT) processor design. The main idea is based on the observation that the FFT processor has regular and predictable memory access patterns, and it can be efficiently exploited for memory customization using eDRAM. The memory customization approaches are applied to both of the pipelined and memory-based FFT architectures. In the pipelined architecture, the read wordline (RWL) coupling write assist and data packing schemes are employed to reduce the redundant RWL and wordline driving, respectively, in column-interleaved memory arrays. The memory address decoder is also simplified with thermometer code by exploiting the sequential access patterns. For the memory-based architecture, the modified cached-memory structure is employed in addition to the techniques used in the pipelined FFT architecture. The hardware implementation results of 2k-point FFT with a 0.11- $\mu {\mathrm{ m}}$ CMOS technology show that the proposed eDRAM-based pipelined and cached-memory FFTs achieve 26.8% and 33.2% power savings over the static RAM-based FFT design, respectively.

[1]  Woong Choi,et al.  A Refresh-Less eDRAM Macro With Embedded Voltage Reference and Selective Read for an Area and Power Efficient Viterbi Decoder , 2015, IEEE Journal of Solid-State Circuits.

[2]  Chris H. Kim,et al.  A 667 MHz Logic-Compatible Embedded DRAM Featuring an Asymmetric 2T Gain Cell for High Speed On-Die Caches , 2012, IEEE Journal of Solid-State Circuits.

[3]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[4]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[5]  David Blaauw,et al.  Low-Power High-Throughput LDPC Decoder Using Non-Refresh Embedded DRAM , 2014, IEEE Journal of Solid-State Circuits.

[6]  Lewis Johnson,et al.  Conflict free memory addressing for dedicated FFT hardware , 1992 .

[7]  H. Fujiwara,et al.  An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment , 2007, 2007 IEEE Symposium on VLSI Circuits.

[8]  David Blaauw,et al.  A Super-Pipelined Energy Efficient Subthreshold 240 MS/s FFT Core in 65 nm CMOS , 2012, IEEE Journal of Solid-State Circuits.

[9]  E. L. Zapata,et al.  Area-efficient architecture for Fast Fourier transform , 1999 .

[10]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[11]  Tajana Simunic,et al.  Resistive configurable associative memory for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[13]  Chris H. Kim,et al.  A 2T1C Embedded DRAM Macro With No Boosted Supplies Featuring a 7T SRAM Based Repair and a Cell Storage Monitor , 2012, IEEE Journal of Solid-State Circuits.

[14]  Shousheng He,et al.  Design and implementation of a 1024-point pipeline FFT processor , 1998, Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No.98CH36143).

[15]  Shang-Ho Tsai,et al.  MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  A. Fercher,et al.  Performance of fourier domain vs. time domain optical coherence tomography. , 2003, Optics express.

[17]  In-Cheol Park,et al.  Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  A. Burg,et al.  A sub-VT 2T gain-cell memory for biomedical applications , 2012, 2012 IEEE Subthreshold Microelectronics Conference (SubVT).

[19]  Chen-Yi Lee,et al.  A dynamic scaling FFT processor for DVB-T applications , 2004 .

[20]  Yibin Ye,et al.  2 GHz 2 Mb 2T Gain Cell Memory Macro With 128 GBytes/sec Bandwidth in a 65 nm Logic Process Technology , 2009, IEEE Journal of Solid-State Circuits.

[21]  A. Chandrakasan,et al.  A 180-mV subthreshold FFT processor using a minimum energy design methodology , 2005, IEEE Journal of Solid-State Circuits.

[22]  Chris H. Kim,et al.  A 3T Gain Cell Embedded DRAM Utilizing Preferential Boosting for High Density and Low Power On-Die Caches , 2011, IEEE Journal of Solid-State Circuits.

[23]  Mohsen Imani,et al.  Approximate Computing Using Multiple-Access Single-Charge Associative Memory , 2018, IEEE Transactions on Emerging Topics in Computing.