Reciprocal abstraction for computer architecture co-simulation

Co-simulation of computer architecture elements at different levels of abstraction and fidelity is becoming an increasing necessity for efficient experimentation and research. We propose reciprocal abstraction for computer architecture cosimulation, which allows the integration of simulation methods that utilize different levels of abstraction and fidelity of simulation. Further, reciprocal abstraction avoids the need to conduct detailed evaluations of individual computer architecture components entirely in a vacuum, which can lead to significant inaccuracies from ignoring the system context. Moreover, it allows an exploration of the impact on the full system resulting from design choices in the detailed component model. We demonstrate the potential inaccuracies of isolated component simulation. Using reciprocal abstraction, we integrate a parallel cycle-level networkon- chip (NoC) component into a detailed but more coarse-grain full system simulator.We show that co-simulation using reciprocal abstraction of the cycle-level network model reduces packet latency error compared to the more abstract network model by 69% on average. Additionally, as simulating a detailed network at the cycle-level can greatly increase simulation time over an abstract model, we implemented detailed network simulator using a GPU coprocessor. The CPU+GPU can reduce simulation time for the reciprocal abstraction co-simulation by 16% for a 256-core target machine and 65% for a 512-core target machine.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Hyunjin Lee,et al.  TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation , 2008, 2008 37th International Conference on Parallel Processing.

[3]  Flávio Rech Wagner,et al.  A Standardized Co-simulation Backbone , 2001, VLSI-SOC.

[4]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[5]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[6]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  Onur Mutlu,et al.  Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[9]  James C. Hoe,et al.  FIST: A fast, lightweight, FPGA-friendly packet latency estimator for NoC modeling in full-system simulations , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[10]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Ahmed Amine Jerraya,et al.  MCI- Multilanguage Distributed Co- Simulation Tool , 1998, DIPES.

[12]  Luca Benini,et al.  Scalable instruction set simulator for thousand-core architectures running on GPGPUs , 2010, 2010 International Conference on High Performance Computing & Simulation.

[13]  Michael Wetter,et al.  Comparison of co-simulation approaches for building and HVAC/R system simulation , 2007 .

[14]  Sangyeun Cho,et al.  In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[15]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[16]  Wu-chun Feng,et al.  Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[17]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[18]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[19]  Rami G. Melhem,et al.  Weighted-Tuple Synchronization for Parallel Architecture Simulators , 2014, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems.

[20]  Jlm Jan Hensen,et al.  Comparison of coupled and decoupled solutions for temperature and air flow in a building , 1999 .

[21]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[22]  Rami G. Melhem,et al.  Scalable Multi-cache Simulation Using GPUs , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[23]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[24]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[25]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[26]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[27]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.