Node Performance and Energy Analysis with the Sniper Multi-core Simulator

Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulations are therefore needed to allow for sufficient exploration of large multi-core systems within a limited simulation time budget. By bringing together accurate high-abstraction analytical models with fast parallel simulation, architects can trade off accuracy with simulation speed to allow for longer application runs, covering a larger portion of the hardware design space. Sniper provides this balance allowing long-running simulations to be modeled much faster than with detailed cycle-accurate simulation, while still providing the detail necessary to observe core-uncore interactions across the entire system. With per-function advanced visualization and coupled power and energy simulations, the Sniper multi-core simulator can provide a fast and accurate way both to understand and optimize software for current and future hardware systems.

[1]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[2]  Lieven Eeckhout,et al.  Power-aware multi-core simulation for early design stage hardware/software co-optimization , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[4]  Jianwei Chen,et al.  Adaptive and Speculative Slack Simulations of CMPs on CMPs , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[5]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[6]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[7]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[8]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[9]  James E. Smith,et al.  A Top-Down Approach to Architecting CPI Component Performance Counters , 2007, IEEE Micro.

[10]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[11]  James E. Smith,et al.  Advanced Micro Devices , 2005 .

[12]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[14]  James E. Smith,et al.  Characterizing the branch misprediction penalty , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[15]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[16]  Aleksandar Milenkovic,et al.  Experiment flows and microbenchmarks for reverse engineering of branch predictor structures , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.