Application-driven power efficient ALU design methodology for modern microprocessors

In this paper, we propose an application-driven ALU design methodology to achieve high level of power efficiency for modern microprocessors. We introduce a PN selection algorithm (PNSA) which enables designers to select power efficient dynamic modules for different applications, based on the detailed analysis of dynamic circuits. Experimental results on ISCAS85 and 74X-Series benchmark circuits show that the power consumption of 8-bit ALU based on this approach can be reduced by 54%-60% for different frequency levels as compared to the conventional dynamic ALU design, demonstrating the effectiveness of the proposed method on application-driven custom ALU design.

[1]  Sung-Mo Kang,et al.  Gate leakage tolerant circuits in deep sub-100 nm CMOS technologies , 2006 .

[2]  Balaram Sinharoy,et al.  POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor , 2011, IEEE Journal of Solid-State Circuits.

[3]  Vivek Tiwari,et al.  Macro-driven circuit design methodology for high-performance datapaths , 2000, Proceedings 37th Design Automation Conference.

[4]  Xu Yang,et al.  Godson-3B: A 1GHz 40W 8-core 128GFLOPS processor in 65nm CMOS , 2011, 2011 IEEE International Solid-State Circuits Conference.

[5]  Yesh Kolla,et al.  A 45nm CMOS 13-port 64-word 41b fully associative content-addressable register file , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[6]  Chenming Hu,et al.  Direct tunneling gate leakage current in transistors with ultrathin silicon nitride gate dielectric , 2000, IEEE Electron Device Letters.

[7]  Atila Alvandpour,et al.  A sub-130-nm conditional keeper technique , 2002, IEEE J. Solid State Circuits.

[8]  Rajesh K. Gupta,et al.  A Methodology for Synthesis of Data Path Circuitse , 2002, IEEE Des. Test Comput..

[9]  Petru Eles,et al.  A scalable GPU-based approach to accelerate the multiple-choice knapsack problem , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Deeparnab Chakrabarty,et al.  Knapsack Problems , 2008 .

[11]  Dawei Huang,et al.  A 40 nm 16-Core 128-Thread SPARC SoC Processor , 2011, IEEE Journal of Solid-State Circuits.

[12]  Wim Dehaene,et al.  An 8-Bit, 40-Instructions-Per-Second Organic Microprocessor on Plastic Foil , 2012, IEEE Journal of Solid-State Circuits.

[13]  Michael Golden,et al.  40-Entry unified out-of-order scheduler and integer execution unit for the AMD Bulldozer x86–64 core , 2011, 2011 IEEE International Solid-State Circuits Conference.

[14]  William J. Bowhill,et al.  A 32 nm, 3.1 Billion Transistor, 12 Wide Issue Itanium® Processor for Mission-Critical Servers , 2012, IEEE Journal of Solid-State Circuits.

[15]  Srinivas Katkoori,et al.  Knapbind: an area-efficient binding algorithm for low-leakage datapaths , 2003, Proceedings 21st International Conference on Computer Design.

[16]  Na Gong,et al.  Clock-biased local bit line for high performance register files , 2012 .

[17]  Na Gong,et al.  Analysis and optimization of leakage current characteristics in sub-65 nm dual Vt footed domino circuits , 2008, Microelectron. J..

[18]  Wuchen Wu,et al.  Low power and high performance dynamic CMOS XOR/XNOR gate design , 2011 .

[19]  Ligang Hou,et al.  Using charge self-compensation domino full-adder with multiple supply and dual threshold voltage in 45nm technology , 2009, 2009 10th International Conference on Ultimate Integration of Silicon.

[20]  Samuel Naffziger,et al.  Design of the Two-Core x86-64 AMD “Bulldozer” Module in 32 nm SOI CMOS , 2012, IEEE Journal of Solid-State Circuits.

[21]  M. E. Dyer,et al.  A branch and bound algorithm for solving the multiple-choice knapsack problem , 1984 .