Using fine grain multithreading for energy efficient computing

We investigate extremely fine-grain multithreading as a means for improving energy efficiency of single-task program execution.Our work is based on low-overhead threads executing an explicitly parallel program in a register-sharing context. The thread-based parallelism takes the place of instruction-level parallelism, allowing us to use simple and more energy-efficient in-order pipelines while retaining performance that is characteristic of classical out-of-order processors. Our evaluation shows that in energy terms, the parallelized code running over in-order pipelines can outperform both plain in-order and out-of-order processors.

[1]  Ramon Canal,et al.  Reducing the complexity of the issue logic , 2001, ICS '01.

[2]  Jaejin Lee,et al.  Compilation techniques for explicitly parallel programs , 1999 .

[3]  David M. Brooks,et al.  An Adaptive Issue Queue for Reduced Power at High Performance , 2000, PACS.

[4]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  André Seznec,et al.  Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[6]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[8]  Chris R. Jesshope,et al.  Micro-threading: a new approach to future RISC , 2000, Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512).

[9]  Kevin Skadron,et al.  Understanding the energy efficiency of simultaneous multithreading , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[10]  LiJian,et al.  Power-performance considerations of parallel computing on chip multiprocessors , 2005 .

[11]  ValeroMateo,et al.  Multiple-banked register file architectures , 2000 .

[12]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[14]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[15]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[16]  Wen-mei W. Hwu,et al.  "Flea-flicker" multipass pipelining: an alternative to the high-power out-of-order offense , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[17]  Chris R. Jesshope Scalable Instruction-Level Parallelism , 2004, SAMOS.

[18]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[19]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[20]  Yen-Kuang Chen,et al.  The energy efficiency of CMP vs. SMT for multimedia workloads , 2004, ICS '04.

[21]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[22]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[23]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[25]  Dean M. Tullsen,et al.  Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[26]  Mark D. Hill,et al.  A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..

[27]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, MICRO.

[28]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[29]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[30]  Krste Asanovic,et al.  Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[31]  Jian Li,et al.  Power-performance considerations of parallel computing on chip multiprocessors , 2005, TACO.

[32]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[33]  Dirk Grunwald,et al.  Data flow equations for explicitly parallel programs , 1993, PPOPP '93.

[34]  James Hook,et al.  Static single assignment for explicitly parallel programs , 1993, POPL '93.