A comparative simulation study on the power–performance of multi-core architecture

Nowadays, multi-core processor is the main technology used in desktop PCs, laptop computers and mobile hardware platforms. As the number of cores on a chip keeps increasing, it adds up the complexity and impacts more on both power and performance of a processor. In multi-processors, the number of cores and various parameters, such as issue-width, number of instructions and execution time, are key design factors to balance the amount of thread-level parallelism and instruction-level parallelism. In this paper, we perform a comprehensive simulation study that aims to find the optimum number of processor cores in desktop/laptop computing processor models with shallow pipeline depth. This paper also explores the trade-off between the number of cores and different parameters used in multi-processors in terms of power–performance gains and analyzes the impact of 3D stacking on the design of simultaneous multi-threading and chip multiprocessing. Our analysis shows that the optimum number of cores varies with different classes of workloads, namely: SPEC2000, SPEC2006 and MiBench. Simulation study is presented using architectures with shorter pipeline depth, showing that (1) the optimum number of cores for power–performance is 8, (2) the optimum number of threads in the range [2, 4], and (3) for beyond 32 cores, multi-core processors are no longer efficient in terms of performance benefits and overall power consumption.

[1]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[2]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[3]  Eric Rotenberg,et al.  A case for dynamic pipeline scaling , 2002, CASES '02.

[4]  Alagan Anpalagan,et al.  An analytical study of resource division and its impact on power and performance of multi-core processors , 2014, The Journal of Supercomputing.

[5]  Margaret Martonosi,et al.  An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation , 2006, IEEE Computer Architecture Letters.

[6]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[7]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[8]  Ricardo Bianchini,et al.  Power and energy management for server systems , 2004, Computer.

[9]  Nan Jiang,et al.  Evaluation of application-aware heterogeneous embedded systems for performance and energy consumption , 2003, The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings..

[10]  Abeer Hyari,et al.  A Comparative Study on Heterogeneous and Homogeneous Multiprocessors , 2009 .

[11]  James E. Smith,et al.  Optimal Pipelining in Supercomputers , 1986, ISCA.

[12]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[13]  Sumedh W. Sathaye,et al.  A technique to determine power-efficient, high-performance superscalar processors , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[14]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[15]  Zeshan Chishti,et al.  Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies , 2008, IEEE Transactions on Computers.

[16]  Kevin Skadron,et al.  Performance, energy, and thermal considerations for SMT and CMP architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.

[17]  Alan Jay Smith,et al.  PACE: a new approach to dynamic voltage scaling , 2004, IEEE Transactions on Computers.

[18]  Li Shang,et al.  Multi-Optimization power management for chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  V. Zyuban,et al.  Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[21]  Karthikeyan Sankaralingam,et al.  SimpleScalar Simulation of the PowerPC Instruction Set Architecture , 2001 .

[22]  Thomas R. Puzak,et al.  Optimum power/performance pipeline depth , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[23]  Mayan Moudgill,et al.  Environment for PowerPC microarchitecture exploration , 1999, IEEE Micro.

[24]  Trevor Mudge,et al.  Dynamic voltage scaling on a low-power microprocessor , 2001 .

[25]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[26]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[27]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[28]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[29]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[30]  Rolf Riesen,et al.  A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration , 2011, PERV.

[31]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[32]  Todd Austin A user's and hacker's guide to the simplescalar architectural research tool set , 1997 .

[33]  Dirk Grunwald,et al.  A Comparison of Two Architectural Power Models , 2000, PACS.

[34]  Pradip Bose,et al.  Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[35]  Michael Gschwind,et al.  New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors , 2003, IBM J. Res. Dev..

[36]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[37]  Ryan E. Grant,et al.  Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[38]  Naraig Manjikian Multiprocessor enhancements of the SimpleScalar tool set , 2001, CARN.

[39]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[40]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.