Efficient Mapping of Applications for Future Chip-Multiprocessors in Dark Silicon Era

The failure of Dennard scaling has led to the utilization wall that is the source of dark silicon and limits the percentage of a chip that can actively switch within a given power budget. To address this issue, a structure is needed to guarantee the limited power budget along with providing sufficient flexibility and performance for different applications with various communication requirements. In this article, we present a general-purpose platform for future many-core Chip-Multiprocessors (CMPs) that benefits from the advantages of clustering, Network-on-Chip (NoC) resource sharing among cores, and power gating the unused components of clusters. We also propose two task mapping methods for the proposed platform in which active and dark cores are dispersed appropriately, so that an excess of power budget can be obtained. Our evaluations reveal that the first and second proposed mapping mechanisms respectively reduce the execution time by up to 28.6% and 39.2% and the NoC power consumption by up to 11.1% and 10%, and gain an excess power budget of up to 7.6% and 13.4% over the baseline architecture.

[1]  Axel Jantsch,et al.  Dark silicon aware power management for manycore systems under dynamic workloads , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[2]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[3]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[4]  Vanchinathan Venkataramani,et al.  Hierarchical power management for asymmetric multi-core in dark silicon era , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Radu Marculescu,et al.  Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[7]  Chris Fallin,et al.  CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[8]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[9]  Cynthia A. Phillips,et al.  Communication-Aware Processor Allocation for Supercomputers , 2005, WADS.

[10]  Weiping Jing,et al.  Energy and thermal aware mapping for mesh-based NoC architectures using multi-objective ant colony algorithm , 2011, 2011 3rd International Conference on Computer Research and Development.

[11]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[13]  Cheng Li,et al.  LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[15]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[16]  Srinivasan Murali,et al.  Bandwidth-constrained mapping of cores onto NoC architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[17]  Massoud Pedram,et al.  TAPP: Temperature-aware application mapping for NoC-based many-core processors , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Li-Shiuan Peh,et al.  Leakage power modeling and optimization in interconnection networks , 2003, ISLPED '03.

[19]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  R. Schaller,et al.  Technological innovation in the semiconductor industry: A case study of the International Technology Roadmap for Semiconductors (ITRS) , 2001, PICMET '01. Portland International Conference on Management of Engineering and Technology. Proceedings Vol.1: Book of Summaries (IEEE Cat. No.01CH37199).

[21]  Heba Khdr,et al.  TSP: Thermal Safe Power - Efficient power budgeting for many-core systems in dark silicon , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[22]  Hongyi Wu,et al.  Shift sprinting: Fine-grained temperature-aware NoC-based MCSoC architecture in dark silicon age , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Alexandre M. Amory,et al.  Multi-task dynamic mapping onto NoC-based MPSoCs , 2011, SBCCI '11.

[24]  Akram Reza,et al.  CoolMap: A Thermal-Aware Mapping Algorithm for Application Specific Networks-on-Chip , 2012, 2012 15th Euromicro Conference on Digital System Design.

[25]  Lizhong Chen,et al.  NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[27]  Yuan Xie,et al.  DimNoC: A dim silicon approach towards power-efficient on-chip network , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Muhammad Shafique,et al.  Dark silicon as a challenge for hardware/software co-design , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[29]  Radu Marculescu,et al.  Energy- and Performance-Aware Incremental Mapping for Networks on Chip With Multiple Voltage Levels , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[31]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[32]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[33]  Kai Ma,et al.  PGCapping: Exploiting power gating for power capping and core lifetime balancing in CMPs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34]  Axel Jantsch,et al.  Dark silicon aware runtime mapping for many-core systems: A patterning approach , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[35]  Muhammad Shafique,et al.  The EDA challenges in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[36]  Vikram Bhatt,et al.  GreenDroid: An architecture for the Dark Silicon Age , 2012, 17th Asia and South Pacific Design Automation Conference.

[37]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  Steven Swanson,et al.  QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  B. Hoefflinger ITRS: The International Technology Roadmap for Semiconductors , 2011 .

[40]  Prabhat Kumar,et al.  Exploring concentration and channel slicing in on-chip network router , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[41]  Amit Kumar Singh,et al.  Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms , 2010, J. Syst. Archit..

[42]  Yuan Xie,et al.  NoC-sprinting: Interconnect for fine-grained sprinting in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[43]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[44]  Yingtao Jiang,et al.  A power-aware mapping approach to map IP cores onto NoCs under bandwidth and latency constraints , 2010, TACO.