SST + gem5 = a scalable simulation infrastructure for high performance computing

High Performance Computing (HPC) faces new challenges in scalability, performance, reliability, and power consumption. Solving these challenges will require radically new hardware and software approaches. It is impractical to explore this vast design space without detailed system-level simulations. However, most of the existing simulators are either not sufficiently detailed, not scalable, or cannot evaluate key system characteristics such as energy consumption or reliability. To address this problem, we integrate the highly detailed gem5 performance simulator into the parallel Structural Simulation Toolkit (SST). We add the fast-forwarding capability in the SST/gem5 and port the lightweight Kitten operating system on gem5. In addition, we improve the reliability model in SST with a comprehensive analysis of system reliability. Utilizing the simulation framework, we evaluate the impact of two energy-efficient resource-conscious scheduling policies on system reliability. Our results show that the effectiveness of scheduling policies differ according to the composition of workload and system topology.

[1]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[3]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[4]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[5]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Ming-yu Hsieh A scalable simulation framework for evaluating thermal management techniques and the lifetime reliability of multithreaded multicore systems , 2011, 2011 International Green Computing Conference and Workshops.

[7]  Alexandra Fedorova,et al.  Managing Contention for Shared Resources on Multicore Processors , 2010 .

[8]  Kevin Skadron,et al.  Differentiating the roles of IR measurement and simulation for power and temperature-aware design , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Dean M. Tullsen,et al.  Fast switching of threads between cores , 2009, OPSR.

[10]  Peter A. Dinda,et al.  Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Mark Giampapa,et al.  Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Yusuf Leblebici,et al.  Analysis and Optimization of MPSoC Reliability , 2006, J. Low Power Electron..

[13]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[15]  Sandia Report,et al.  The Portals 4.0 Message Passing Interface , 2008 .

[16]  Keith D. Underwood,et al.  Simulating Red Storm: Challenges and Successes in Building a System Simulation , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[17]  Rolf Riesen,et al.  SST: A Scalable Parallel Framework for Architecture-Level Performance, Power, Area and Thermal Simulation , 2012, Computer/law journal.

[18]  Frank Bellosa,et al.  Resource-conscious scheduling for energy efficiency on multicore processors , 2010, EuroSys '10.

[19]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Rolf Riesen,et al.  A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration , 2011, PERV.