Power and Energy Characterization of an Open Source 25-Core Manycore Processor

The end of Dennard’s scaling and the looming power wall have made power and energy primary design goals for modern processors. Further, new applications such as cloud computing and Internet of Things (IoT) continue to necessitate increased performance and energy efficiency. Manycore processors show potential in addressing some of these issues. However, there is little detailed power and energy data on manycore processors. In this work, we carefully study detailed power and energy characteristics of Piton, a 25-core modern open source academic processor, including voltage versus frequency scaling, energy per instruction (EPI), memory system energy, network-on-chip (NoC) energy, thermal characteristics, and application performance and power consumption. This is the first detailed power and energy characterization of an open source manycore design implemented in silicon. The open source nature of the processor provides increased value, enabling detailed characterization verified against simulation and the ability to correlate results with the design and register transfer level (RTL) model. Additionally, this enables other researchers to utilize this work to build new power models, devise new research directions, and perform accurate power and energy research using the open source processor. The characterization data reveals a number of interesting insights, including that operand values have a large impact on EPI, recomputing data can be more energy efficient than loading it from memory, on-chip data transmission (NoC) energy is low, and insights on energy efficient multithreaded core design. All data collected and the hardware infrastructure used is open source and available for download at http://www.openpiton.org.

[1]  Anantha Chandrakasan,et al.  SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  Bruce Jacob,et al.  Instruction-level power dissipation in the Intel XScale embedded microprocessor , 2005, IS&T/SPIE Electronic Imaging.

[3]  Lingjia Tang,et al.  PowerChop: Identifying and Managing Non-critical Units in Hybrid Processor Architectures , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[4]  George Chrysos,et al.  Intel® Xeon Phi coprocessor (codename Knights Corner) , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[5]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[6]  Massoud Pedram,et al.  Power punch: Towards non-blocking power-gating of NoC routers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  William J. Starke POWER7: IBM's next generation, balanced POWER server chip , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[8]  David M. Brooks,et al.  Energy characterization and instruction-level energy model of Intel's Xeon Phi processor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[9]  Hokeun Kim,et al.  Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[10]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[11]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Eric Rotenberg,et al.  Rationale for a 3D heterogeneous multi-core processor , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[13]  Jörg Henkel,et al.  TAPE: Thermal-aware agent-based power econom multi/many-core architectures , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[14]  William J. Dally,et al.  SLIP: Reducing wire energy in the memory hierarchy , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[15]  Yunsup Lee,et al.  A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators , 2014, ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC).

[16]  David Wentzlaff,et al.  Coherence domain restriction on large scale systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  David Wentzlaff,et al.  Execution Drafting: Energy Efficiency through Computation Deduplication , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Trevor Mudge,et al.  Centip3De: A 64-core, 3D stacked, near-threshold system , 2012 .

[19]  Kai Ma,et al.  Scalable power control for many-core architectures running multi-threaded applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[20]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[21]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[22]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[23]  Heba Khdr,et al.  TSP: Thermal Safe Power - Efficient power budgeting for many-core systems in dark silicon , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[24]  David Wentzlaff,et al.  Energy characterization of a tiled architecture processor with on-chip networks , 2003, ISLPED '03.

[25]  Michael Bedford Taylor,et al.  A Landscape of the New Dark Silicon Design Regime , 2013, IEEE Micro.

[26]  David Wentzlaff,et al.  OpenPiton: An Open Source Manycore Research Framework , 2016, ASPLOS.

[27]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[28]  Meeta Sharma Gupta,et al.  Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[30]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[31]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[32]  Simha Sethumadhavan,et al.  Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[33]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[34]  Thomas Ilsche,et al.  An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[35]  Qinru Qiu,et al.  Distributed task migration for thermal management in many-core systems , 2010, Design Automation Conference.

[36]  Bishop Brock,et al.  Accurate Fine-Grained Processor Power Proxies , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[37]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[38]  Carl Ramey,et al.  TILE-Gx100 ManyCore processor: Acceleration interfaces and architecture , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[39]  Natalie D. Enright Jerger,et al.  Improving DVFS in NoCs with Coherence Prediction , 2015, NOCS.

[40]  Narayanan Vijaykrishnan,et al.  Designing energy-efficient NoC for real-time embedded systems through slack optimization , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[41]  Ananta Tiwari,et al.  Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation , 2014, Euro-Par.

[42]  Eric Senn,et al.  Power Consumption Modeling and Characterization of the TI C6201 , 2003, IEEE Micro.

[43]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[44]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[45]  David Wentzlaff,et al.  MITTS: Memory Inter-arrival Time Traffic Shaping , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[46]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[47]  David Wentzlaff,et al.  Piton: A Manycore Processor for Multitenant Clouds , 2017, IEEE Micro.

[48]  Charles Zhang Mars: A 64-core ARMv8 processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[49]  N. Gura,et al.  UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[50]  Saibal Mukhopadhyay,et al.  Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits , 2003, Proc. IEEE.

[51]  Qijun Gu,et al.  Energy and Power Characterization of Parallel Programs Running on Intel Xeon Phi , 2014, 2014 43rd International Conference on Parallel Processing Workshops.

[52]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[53]  Li Shang,et al.  Thermal Modeling, Characterization and Management of On-Chip Networks , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[54]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[55]  Eric Rotenberg,et al.  AnyCore: A synthesizable RTL model for exploring and fabricating adaptive superscalar cores , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[56]  David Wentzlaff,et al.  Piton: A 25-core academic manycore research processor , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).

[57]  John Sartori,et al.  Exploiting Dynamic Timing Slack for Energy Efficiency in Ultra-Low-Power Embedded Systems , 2016, ISCA.

[58]  Houman Homayoun,et al.  Managing distributed UPS energy for effective power capping in data centers , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[59]  Anantha Chandrakasan,et al.  SCORPIO: 36-core shared memory processor demonstrating snoopy coherence on a mesh interconnect , 2014, IEEE Hot Chips Symposium.

[60]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[61]  Elad Alon,et al.  A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC–DC Converters in 28 nm FDSOI , 2016, IEEE Journal of Solid-State Circuits.