How device properties influence energy-delay metrics and the energy-efficiency of parallel computations

Semiconductor device engineers are hard-pressed to relate observed device-level properties of potential CMOS replacements to computation performance. We address this challenge by developing a model linking device properties to algorithm parallelism, total computational work, and degree of voltage and frequency scaling. We then use the model to provide insight into how device properties influence execution time, average power dissipation, and overall energy usage of parallel algorithms executing in the presence of hardware concurrency. The model facilitates studying tradeoffs: It lets researchers formulate joint energy-delay metrics that account for device properties. We support our analysis with data from a dozen large digital circuit designs, and we validate the models we present using performance and power measurements of a parallel algorithm executing on a state-of-the-art low-power multicore processor.

[1]  P. M. Solomon,et al.  Compact model and performance estimation for tunneling nanowire FET , 2011, 69th Device Research Conference.

[2]  S. Datta,et al.  Switching energy-delay of all spin logic devices , 2010, 1012.0861.

[3]  Paul I. Pénzes,et al.  Energy-delay efficiency of VLSI computations , 2002, GLSVLSI '02.

[4]  Uming Ko,et al.  A 28 nm 0.6 V Low Power DSP for Mobile Applications , 2012, IEEE Journal of Solid-State Circuits.

[5]  Phillip Stanley-Marbell,et al.  L24: Parallelism, performance, energy efficiency, and cost trade-offs in future sensor platforms , 2013, TECS.

[6]  Leff,et al.  The Influence of Transistor Properties on Performance Metrics and the Energy-Efficiency of Parallel Computations , 2012 .

[7]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[8]  Paul M. Solomon,et al.  In Quest of the “Next Switch”: Prospects for Greatly Reduced Power Dissipation in a Successor to the Silicon Field-Effect Transistor , 2010, Proceedings of the IEEE.

[9]  Gul A. Agha,et al.  Analysis of Parallel Algorithms for Energy Conservation in Scalable Multicore Architectures , 2009, 2009 International Conference on Parallel Processing.

[10]  Qin Zhang,et al.  Low-Voltage Tunnel Transistors for Beyond CMOS Logic , 2010, Proceedings of the IEEE.

[11]  Stuart A. Wolf,et al.  Spintronics : A Spin-Based Electronics Vision for the Future , 2009 .

[12]  Tsu-Jae King Liu,et al.  Perfectly Complementary Relay Design for Digital Logic Applications , 2010, IEEE Electron Device Letters.

[13]  H.-S. Philip Wong,et al.  Performance benchmarks for Si, III–V, TFET, and carbon nanotube FET - re-thinking the technology assessment methodology for complementary logic applications , 2010, 2010 International Electron Devices Meeting.

[14]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[15]  Mark R. Greenstreet,et al.  Modeling Energy-Time Trade-Offs in VLSI Computation , 2012, IEEE Transactions on Computers.

[16]  A. R. Newton,et al.  Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas , 1990 .

[17]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[18]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[19]  F. Schwierz Graphene transistors. , 2010, Nature nanotechnology.

[20]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[21]  Anantha Chandrakasan,et al.  A 28nm 0.6V low-power DSP for mobile applications , 2011, 2011 IEEE International Solid-State Circuits Conference.

[22]  David Blaauw,et al.  Assessing the performance limits of parallelized near-threshold computing , 2012, DAC Design Automation Conference 2012.

[23]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[24]  Saurabh Dighe,et al.  A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.