Adaptive Algorithm and Tool Flow for Accelerating SystemC on Many-Core Architectures

Within this paper an adaptive approach for parallel simulation of SystemC RTL models on future many-core architectures like the Single-chip Cloud Computer (SCC) from Intel is presented. It is based on a configurable parallel SystemC kernel that preserves the partial order defined by the SystemC delta cycles while avoiding global synchronization as far as possible. The underlying algorithm relies on a classification of existing communication relations between parallel processes. The type and topology of communication relations determines the type and number of causality conditions that need to be fulfilled during runtime. The parallel kernel is complemented by an automated tool flow that allows detecting relevant model-specific properties, performing a fine-grained model partitioning, classifying communication relations and configuring the kernel. Experiments by means of a MPSoC model show, that pure local synchronization can provide significant performance gains compared to global synchronization. Furthermore, the combination of local synchronization with fine-grained partitioning provides additional degrees of freedom for optimization.

[1]  Richard M. Fujimoto,et al.  Parallel and Distribution Simulation Systems , 1999 .

[2]  Christoph Roth,et al.  A Framework for exploration of parallel SystemC simulation on the single-chip cloud computer , 2012, SimuTools.

[3]  Fernando Gehm Moraes,et al.  HeMPS - a framework for NoC-based MPSoC generation , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[4]  Hiren D. Patel,et al.  Systemc-clang: An open-source framework for analyzing mixed-abstraction SystemC models , 2013, Proceedings of the 2013 Forum on specification and Design Languages (FDL).

[5]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[6]  Eddy Caron,et al.  Relaxing Synchronization in a Parallel SystemC Kernel , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[7]  Christoph Roth,et al.  Adaptive Algorithm and Tool Flow for Accelerating System C on Many-Core Architectures , 2014, DSD.

[8]  Sandeep K. Shukla,et al.  SCGPSim: A fast SystemC simulator on GPUs , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[9]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[10]  Alain Greiner,et al.  Parallel simulation of systemC TLM 2.0 compliant MPSoC on SMP workstations , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[11]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[12]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[13]  Christoph Roth,et al.  Asynchronous parallel MPSoC simulation on the Single-Chip Cloud Computer , 2012, 2012 International Symposium on System on Chip (SoC).

[14]  Kevin Marquet,et al.  PinaVM: a systemC front-end based on an executable intermediate representation , 2010, EMSOFT '10.

[15]  Christoph Roth,et al.  A SystemC modeling and simulation methodology for fast and accurate parallel MPSoC simulation , 2013, 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI).

[16]  Bastien Chopard,et al.  A Conservative Approach to SystemC Parallelization , 2006, International Conference on Computational Science.

[17]  Andreas Gerstlauer,et al.  Introduction to Hardware-dependent Software design , 2009, 2009 Asia and South Pacific Design Automation Conference.

[18]  Hiren D. Patel,et al.  Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs , 2012, 17th Asia and South Pacific Design Automation Conference.

[19]  Nicolas Ventroux,et al.  A systemc TLM framework for distributed simulation of complex systems with unpredictable communication , 2011, Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP).

[20]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[21]  Franco Fummi,et al.  SAGA: SystemC acceleration on GPU architectures , 2012, DAC Design Automation Conference 2012.

[22]  Rainer Leupers,et al.  parSC: Synchronous parallel SystemC simulation on multi-core host architectures , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[23]  P. Ezudheen,et al.  Parallelizing SystemC Kernel for Fast Hardware Simulation on SMP Machines , 2009, 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation.