Computation with Energy-Time Trade-Offs: Models, Algorithms and Lower-Bounds

Power consumption has become one of the most critical concerns for processor design. This motivates designing algorithms for minimum execution time subject to energy constraints. We propose simple models for analysing algorithms that reflect the energy-time trade-offs of CMOS circuits. Using these models, we derive lower bounds for the energy-constrained execution time of sorting, addition and multiplication, and we present algorithms that meet these bounds. We show that minimizing time under energy constraints is not the same as minimizing operation count or computation depth.

[1]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[2]  Tei-Wei Kuo,et al.  Multiprocessor energy-efficient scheduling with task migration considerations , 2004, Proceedings. 16th Euromicro Conference on Real-Time Systems, 2004. ECRTS 2004..

[3]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[4]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[5]  Kirk Pruhs,et al.  Dynamic speed scaling to manage energy and temperature , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[6]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[7]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[8]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[9]  C. Thomborson,et al.  A Complexity Theory for VLSI , 1980 .

[10]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[11]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[12]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[13]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[14]  Franco P. Preparata,et al.  A Mesh-Connected Area-Time Optimal VLSI Multiplier of Large Integers , 1983, IEEE Transactions on Computers.

[15]  C. Mead,et al.  Fundamental limitations in microelectronics—I. MOS technology , 1972 .

[16]  William J. Dally,et al.  A VLSI Architecture for Concurrent Data Structures , 1987 .

[17]  Lawrence T. Clark,et al.  An embedded 32-b microprocessor core for low-power and high-performance applications , 2001 .

[18]  Alain J. Martin Towards an energy complexity of computation , 2001, Inf. Process. Lett..

[19]  H. T. Kung,et al.  The Area-Time Complexity of Binary Multiplication , 1981, JACM.

[20]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[21]  Erol Gelenbe,et al.  Multiprocessor Performance , 1990, SIGMETRICS Perform. Evaluation Rev..

[22]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[23]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[24]  Jo C. Ebergen,et al.  Transistor sizing: how to control the speed and energy consumption of a circuit , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[25]  M. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[26]  Adi Shamir,et al.  An optimal sorting algorithm for mesh connected computers , 1986, STOC '86.

[27]  Donald E. Knuth,et al.  Computer programming as an art , 1974, CACM.

[28]  Tei-Wei Kuo,et al.  Power-Saving Scheduling for Weakly Dynamic Voltage Scaling Devices , 2005, WADS.

[29]  Susanne Albers,et al.  Speed scaling on parallel processors , 2007, SPAA.

[30]  Brad D Bingham,et al.  Energy-Time Complexity of Algorithms: Modelling the Trade-offs of CMOS VLSI , 2007 .

[31]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[32]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[33]  Sartaj Sahni,et al.  Programming a hypercube multicomputer , 1988, IEEE Software.

[34]  E. Alon,et al.  The implementation of a 2-core, multi-threaded itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[35]  Diana Marculescu,et al.  Power and performance evaluation of globally asynchronous locally synchronous processors , 2002, ISCA.

[36]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.