论文信息 - Using fine grain multithreading for energy efficient computing

Using fine grain multithreading for energy efficient computing

We investigate extremely fine-grain multithreading as a means for improving energy efficiency of single-task program execution.Our work is based on low-overhead threads executing an explicitly parallel program in a register-sharing context. The thread-based parallelism takes the place of instruction-level parallelism, allowing us to use simple and more energy-efficient in-order pipelines while retaining performance that is characteristic of classical out-of-order processors. Our evaluation shows that in energy terms, the parallelized code running over in-order pipelines can outperform both plain in-order and out-of-order processors.

[1] Ramon Canal,et al. Reducing the complexity of the issue logic , 2001, ICS '01.

[2] Jaejin Lee,et al. Compilation techniques for explicitly parallel programs , 1999 .

[3] David M. Brooks,et al. An Adaptive Issue Queue for Reduced Power at High Performance , 2000, PACS.

[4] Mateo Valero,et al. Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5] André Seznec,et al. Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[6] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7] Doug Burger,et al. Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[8] Chris R. Jesshope,et al. Micro-threading: a new approach to future RISC , 2000, Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512).

[9] Kevin Skadron,et al. Understanding the energy efficiency of simultaneous multithreading , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[10] LiJian,et al. Power-performance considerations of parallel computing on chip multiprocessors , 2005 .

[11] ValeroMateo,et al. Multiple-banked register file architectures , 2000 .

[12] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[14] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .

[15] Antonio González,et al. Energy-effective issue logic , 2001, ISCA 2001.

[16] Wen-mei W. Hwu,et al. "Flea-flicker" multipass pipelining: an alternative to the high-power out-of-order offense , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[17] Chris R. Jesshope. Scalable Instruction-Level Parallelism , 2004, SAMOS.

[18] John L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[19] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[20] Yen-Kuang Chen,et al. The energy efficiency of CMP vs. SMT for multimedia workloads , 2004, ICS '04.

[21] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[22] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[23] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24] Dirk Grunwald,et al. Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[25] Dean M. Tullsen,et al. Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[26] Mark D. Hill,et al. A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..

[27] Rajeev Balasubramonian,et al. Reducing the complexity of the register file in dynamic superscalar processors , 2001, MICRO.

[28] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[29] Dean M. Tullsen,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[30] Krste Asanovic,et al. Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[31] Jian Li,et al. Power-performance considerations of parallel computing on chip multiprocessors , 2005, TACO.

[32] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .

[33] Dirk Grunwald,et al. Data flow equations for explicitly parallel programs , 1993, PPOPP '93.

[34] James Hook,et al. Static single assignment for explicitly parallel programs , 1993, POPL '93.