InvadeSIM-A Simulation Framework for Invasive Parallel Programs and Architectures

In this chapter novel, fast, and flexible simulation techniques for modern heterogeneous NoC-based multi-core architectures are presented. They include the design and development of the full-system simulator InvadeSIM, which allows modeling complex MPSoC architectures, emulating the execution behavior of the runtime system, and simulating function and timing of invasive parallel applications apart from utilization, efficiency, and competition. A novel high-level processor simulation approach based on direct-execution and a linear timing estimation model is proposed that tackles the complexity and the heterogeneity of current multi and many-core architectures. Furthermore, a discrete-event simulation framework is presented, which allows integrating and synchronizing different simulation tasks such as software or hardware simulations. Besides processor simulation, exemplary timing models for hardware accelerators such as tightly-coupled processor arrays and special cores with instruction-set extensions are presented.

[1]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[2]  Luca Benini,et al.  MPARM: Exploring the Multi-Processor SoC Design Space with SystemC , 2005, J. VLSI Signal Process..

[3]  Nikil D. Dutt,et al.  A retargetable framework for instruction-set architecture simulation , 2006, TECS.

[4]  Heinrich Meyr,et al.  Compiled Simulation of Programmable DSP Architectures , 1997, J. VLSI Signal Process..

[5]  Rainer Leupers,et al.  Generation of interpretive and compiled instruction set simulators , 1999, Proceedings of the ASP-DAC '99 Asia and South Pacific Design Automation Conference 1999 (Cat. No.99EX198).

[6]  Wolfgang Rosenstiel,et al.  Fast and accurate resource conflict simulation for performance analysis of multi-core systems , 2011, 2011 Design, Automation & Test in Europe.

[7]  Kun Lu,et al.  An approach to improve accuracy of source-level TLMs of embedded software , 2011, 2011 Design, Automation & Test in Europe.

[8]  Natalie D. Enright Jerger,et al.  Power Modeling for Heterogeneous Processors , 2014, GPGPU@ASPLOS.

[9]  Andreas Herkersdorf,et al.  TAPES—Trace-based architecture performance evaluation with SystemC , 2005, Des. Autom. Embed. Syst..

[10]  Jianwen Zhu,et al.  An ultra-fast instruction set simulator , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[11]  Lei Gao,et al.  HySim: A fast simulation framework for embedded software development , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[12]  Robert C. Bedichek Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.

[13]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[14]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[15]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[16]  Daniel Gajski,et al.  Cycle-approximate Retargetable Performance Estimation at the Transaction Level , 2008, 2008 Design, Automation and Test in Europe.

[17]  Daniel Page SPIM: A MIPS32 Simulator , 2009 .

[18]  Srinivas Devadas,et al.  ISDL: an instruction set description language for retargetability , 1997, DAC.

[19]  Daniel D. Gajski,et al.  A retargetable, ultra-fast instruction set simulator , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[20]  Wolfgang Rosenstiel,et al.  High-performance timing simulation of embedded software , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[21]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[22]  Zhonglei Wang,et al.  An efficient approach for system-level timing simulation of compiler-optimized embedded software , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[23]  Oscar Almer,et al.  Scalable multi-core simulation using parallel dynamic binary translation , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[24]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[25]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[26]  Rainer Leupers,et al.  A universal technique for fast and flexible instruction-set architecture simulation , 2002, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Paolo Faraboschi,et al.  Combining Simulation and Virtualization through Dynamic Sampling , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[28]  Margaret Martonosi,et al.  Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[29]  James R. Larus,et al.  Fast out-of-order processor simulation using memoization , 1998, ASPLOS VIII.

[30]  Edwin A. Harcourt,et al.  Generation of software tools from processor descriptions for hardware/software codesign , 1997, DAC.

[31]  Soonhoi Ha,et al.  Handbook of Hardware/Software Codesign , 2017, Handbook of Hardware/Software Codesign.

[32]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[33]  Wolfgang Rosenstiel,et al.  Fast and accurate source-level simulation of software timing considering complex code optimizations , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  James E. Fowler,et al.  Compiled instruction set simulation , 1991, Softw. Pract. Exp..

[35]  Jürgen Teich,et al.  Approximate time functional simulation of resource-aware programming concepts for heterogeneous MPSoCs , 2012, 17th Asia and South Pacific Design Automation Conference.

[36]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[37]  James R. Larus,et al.  Facile: a language and compiler for high-performance processor simulators , 2001, PLDI '01.

[38]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[39]  Jörg Henkel,et al.  Accurate source-level simulation of embedded software with respect to compiler optimizations , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[40]  Andreas Gerstlauer,et al.  Host-compiled simulation of multi-core platforms , 2010, Proceedings of 2010 21st IEEE International Symposium on Rapid System Protyping.

[41]  Jorg Henkel,et al.  i-Core: A run-time adaptive processor for embedded multi-core systems , 2011 .

[42]  Jianwen Zhu,et al.  DynamoSim: a trace-based dynamically compiled instruction set simulator , 2004, ICCAD 2004.

[43]  Jürgen Teich,et al.  Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays , 2018, ACM Trans. Embed. Comput. Syst..

[44]  Arkady B. Zaslavsky,et al.  GroupSense , 2019, ACM Trans. Embed. Comput. Syst..

[45]  Nikil D. Dutt,et al.  EXPRESSION: a language for architecture exploration through compiler/simulator retargetability , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[46]  Eduard Ayguadé,et al.  Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up , 2013, Comput. J..

[47]  Jürgen Teich,et al.  Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation , 2012, Map2MPSoC/SCOPES.

[48]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[49]  R. C. Covington,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[50]  Jürgen Teich,et al.  Symbolic Mapping of Loop Programs onto Processor Arrays , 2014, J. Signal Process. Syst..

[51]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[52]  Multiprocessor performance estimation using hybrid simulation , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[53]  Thomas F. Wenisch,et al.  Statistical sampling of microarchitecture simulation , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[54]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[55]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[56]  Jürgen Teich,et al.  Resource-aware programming and simulation of MPSoC architectures through extension of X10 , 2011, SCOPES.

[57]  Igor Böhm,et al.  Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[58]  Pierre G. Paulin,et al.  Insulin: An Instruction Set Simulation Environment , 1993, CHDL.

[59]  Nikil D. Dutt,et al.  Instruction set compiled simulation: a technique for fast and flexible instruction set simulation , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[60]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[61]  Andreas Gerstlauer,et al.  Automatic timing granularity adjustment for host-compiled software simulation , 2012, 17th Asia and South Pacific Design Automation Conference.

[62]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[63]  Rainer Leupers,et al.  Synchronization for hybrid MPSoC full-system simulation , 2012, DAC Design Automation Conference 2012.