Energy-Efficient Reconfigurable Computing Using a Circuit-Architecture-Software Co-Design Approach

Reconfigurable computing frameworks such as field programmable gate array (FPGA) provide flexibility to map arbitrary applications. However, their intrinsic flexibility comes at the cost of significantly worse performance and power dissipation than their custom counterparts. Existing design solutions such as voltage scaling and multi-threshold assignment typically trade off energy for performance or vise versa. In this paper, we show that an integrated circuit-architecture-software co-design approach can be extremely effective to simultaneously improve the power and performance of a reconfigurable hardware framework, leading to large improvement in energy-delay product (EDP). First, we select a spatio-temporal reconfigurable computing architecture based on 2-threshold assignment-D memory-array. Applications are mapped to memory as multiple-input multiple-output lookup tables (LUTs) and are evaluated in temporal manner inside a computing element. Multiple such computing elements communicate spatially through programmable interconnects. Next, we exploit the read-dominant memory access pattern in reconfigurable hardware to design an asymmetric memory cell, which provides higher read performance and lower read power leading to improvement in the overall EDP during operation. We note that the proposed memory cell is also asymmetric in terms of its content, providing better read power for one of the logic states (logic “0” or “1”). Based on this observation, next we propose a content-aware application mapping approach, which tries to maximize the logic “0” or logic “1” content in the lookup tables. A design flow is presented to incorporate the proposed architecture, asymmetric memory cell design and content-aware mapping. We show that for both nanoscale complementary metal-oxide-semiconductor (CMOS) [static random access memory (SRAM)] as well as emerging non-CMOS [spin torque transfer random access memory (STTRAM)] memory technologies, such a co-design solution can achieve significant improvement in system EDP over a conventional FPGA framework.

[1]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[2]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[3]  Steven J. E. Wilton,et al.  Heterogeneous technology mapping for area reduction in FPGAs withembedded memory arrays , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Jon M. Slaughter,et al.  Magnetoresistive random access memory using magnetic tunnel junctions , 2003, Proc. IEEE.

[5]  Swarup Bhunia,et al.  A circuit-software co-design approach for improving EDP in reconfigurable frameworks , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[6]  R. Williams,et al.  How We Found The Missing Memristor , 2008, IEEE Spectrum.

[7]  N. Vallepalli,et al.  A 3-GHz 70-mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply , 2005, IEEE Journal of Solid-State Circuits.

[8]  Swarup Bhunia,et al.  Reconfigurable computing using content addressable memory for improved performance and resource usage , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[9]  Viktor K. Prasanna Energy-Efficient Computations on FPGAs , 2005, The Journal of Supercomputing.

[10]  Dhiraj K. Pradhan,et al.  Single Ended Static Random Access Memory for Low-Vdd, High-Speed Embedded Systems , 2009, 2009 22nd International Conference on VLSI Design.

[11]  S. Ikeda,et al.  2 Mb SPRAM (SPin-Transfer Torque RAM) With Bit-by-Bit Bi-Directional Current Write and Parallelizing-Direction Current Read , 2008, IEEE Journal of Solid-State Circuits.

[12]  Kinam Kim,et al.  A 0.18 /spl mu/m 3.0 V 64 Mb non-volatile phase-transition random-access memory (PRAM) , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[13]  H. Ohno,et al.  Current-Driven Magnetization Switching in CoFeB/MgO/CoFeB Magnetic Tunnel Junctions , 2005, INTERMAG 2006 - IEEE International Magnetics Conference.

[14]  Byung-Gil Choi,et al.  Phase-Transition Random-Access Memory (PRAM) , 2004 .

[15]  Swarup Bhunia,et al.  Nanoscale reconfigurable computing using non-volatile 2-D STTRAM array , 2009, 2009 9th IEEE Conference on Nanotechnology (IEEE-NANO).

[16]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  Shoji Ikeda,et al.  2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[19]  Anna W. Topol,et al.  Stable SRAM cell design for the 32 nm node and beyond , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Technology, 2005..

[20]  D. Jones,et al.  A time-multiplexed FPGA architecture for logic emulation , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[21]  Anantha Chandrakasan,et al.  Wiring requirement and three-dimensional integration technology for field programmable gate arrays , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[22]  Jason Cong,et al.  Low-power FPGA using pre-defined dual-Vdd/dual-Vt fabrics , 2004, FPGA '04.

[23]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[24]  Jason Cong,et al.  Technology mapping for FPGAs with embedded memory blocks , 1998, FPGA '98.

[25]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[26]  Jason Cong,et al.  Performance-driven technology mapping for heterogeneous FPGAs , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[27]  Steven J. E. Wilton,et al.  SMAP: heterogeneous technology mapping for area reduction in FPGAs with embedded memory arrays , 1998, FPGA '98.

[28]  Swarup Bhunia,et al.  MBARC: A scalable memory based reconfigurable computing framework for nanoscale devices , 2008, 2008 Asia and South Pacific Design Automation Conference.

[29]  Guy Lemieux,et al.  Circuit design of routing switches , 2002, FPGA '02.