Reconfigurable STT-NV LUT-based functional units to improve performance in general-purpose processors

Unavailability of functional units is a major performance bottleneck in general-purpose processors (GPP). In a GPP with limited number of functional units while a functional unit may be heavily utilized at times, creating a performance bottleneck, the other functional units might be under-utilized. We propose a novel idea for adapting functional units in GPP architecture in order to overcome this challenge. For this purpose, a selected set of complex functional units that might be under-utilized such as multiplier and divider, are realized using a programmable look up table-based fabric. This allows for run-time adaptation of functional units to improving performance. The programmable look up tables are realized using magnetic tunnel junction (MTJ) based memories that dissipate near zero leakage and are CMOS compatible. We have applied this idea to a dual issue architecture. The results show that compared to a design with all CMOS functional units a performance improvement of 18%, on average is achieved for standard benchmarks. This comes with 4.1% power increase in integer benchmarks and 2.3% power decrease in floating point benchmarks, compared to a CMOS design.

[1]  Houman Homayoun,et al.  A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction , 2014, GLSVLSI '14.

[2]  Houman Homayoun,et al.  Reducing Execution Unit Leakage Power in Embedded Processors , 2006, SAMOS.

[3]  Houman Homayoun,et al.  Exploiting STT-NV technology for reconfigurable, high performance, low power, and low temperature functional unit design , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[5]  K.F. Li,et al.  Functional units power gating in SMT processors , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[6]  Milind Girkar,et al.  Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core™ 2 Duo processor , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.

[7]  Engin Ipek,et al.  Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing , 2010, ISCA.

[8]  Pradip Bose,et al.  Dynamic power gating with quality guarantees , 2009, ISLPED.

[9]  Dejan Markovic,et al.  True Energy-Performance Analysis of the MTJ-Based Logic-in-Memory Architecture (1-Bit Full Adder) , 2010, IEEE Transactions on Electron Devices.

[10]  Tinoosh Mohsenin,et al.  Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction , 2014, Sensing Technologies + Applications.

[11]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Tinoosh Mohsenin,et al.  Low-complexity FPGA implementation of compressive sensing reconstruction , 2013, 2013 International Conference on Computing, Networking and Communications (ICNC).

[13]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[14]  Tinoosh Mohsenin,et al.  A many-core platform implemented for multi-channel seizure detection , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[15]  Houman Homayoun,et al.  A 64-core platform for biomedical signal processing , 2013, International Symposium on Quality Electronic Design (ISQED).

[16]  Tinoosh Mohsenin,et al.  An efficient & reconfigurable FPGA and ASIC implementation of a spectral Doppler ultrasound imaging system , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[17]  Houman Homayoun,et al.  Resistive Computation: A Critique , 2014, IEEE Computer Architecture Letters.

[18]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).