MosaicSim: A Lightweight, Modular Simulator for Heterogeneous Systems

As Moore's Law has slowed and Dennard Scaling has ended, architects are increasingly turning to heterogeneous parallelism and domain-specific hardware-software co-designs. These trends present new challenges for simulation-based performance assessments that are central to early-stage architectural exploration. Simulators must be lightweight to support rich heterogeneous combinations of general purpose cores and specialized processing units. They must also support agile exploration of hardware-software co-design, i.e. changes in the programming model, compiler, ISA, and specialized hardware. To meet these challenges, we introduce MosaicSim, a lightweight, modular simulator for heterogeneous systems, offering accuracy and agility designed specifically for hardware-software co-design explorations. By integrating the LLVM toolchain, MosaicSim enables efficient modeling of instruction dependencies and flexible additions across the stack. Its modularity also allows the composition and integration of different hardware components. We first demonstrate that MosaicSim captures architectural bottlenecks in applications, and accurately models both scaling trends in a multicore setting and accelerator behavior. We then present two case-studies where MosaicSim enables straightforward design space explorations for emerging systems, i.e. data science application acceleration and heterogeneous parallel architectures.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Luca P. Carloni,et al.  Broadening the exploration of the accelerator design space in embedded scalable platforms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[4]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  David A. Patterson,et al.  A new golden age for computer architecture , 2019, Commun. ACM.

[6]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[7]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Luca P. Carloni,et al.  Accelerators and Coherence: An SoC Perspective , 2018, IEEE Micro.

[9]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  James E. Smith,et al.  Decoupled access/execute computer architectures , 1984, TOCS.

[11]  Luca P. Carloni,et al.  ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Gu-Yeon Wei,et al.  Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Margaret Martonosi,et al.  DeSC: Decoupled supply-compute communication management for heterogeneous architectures , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Lei Wang,et al.  BNPMDA: Bipartite Network Projection for MiRNA–Disease Association prediction , 2018, Bioinform..

[15]  Luca P. Carloni,et al.  NoC-Based Support of Heterogeneous Cache-Coherence Models for Accelerators , 2018, 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[16]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[17]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[18]  David Wentzlaff,et al.  PriME: A parallel and distributed simulator for thousand-core chips , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[19]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[20]  Dean M. Tullsen,et al.  Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[21]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[22]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[23]  Stijn Eyerman,et al.  An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..

[24]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[25]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[26]  Hamed Tabkhi,et al.  Scalable LLVM-Based Accelerator Modeling in gem5 , 2019, IEEE Computer Architecture Letters.

[27]  Margaret Martonosi,et al.  Efficient Data Supply for Parallel Heterogeneous Architectures , 2019, ACM Trans. Archit. Code Optim..

[28]  David Wentzlaff,et al.  OpenPiton: An Open Source Manycore Research Framework , 2016, ASPLOS.

[29]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[30]  Margaret Martonosi,et al.  Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous Architectures , 2017, ACM Trans. Archit. Code Optim..

[31]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[32]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.