A Virtualization-Assisted Full-System Simulation Approach for the Verification of System Intercomponent Interactions

We propose in this article a near-real-time performance full-system simulation approach with hardware acceleration using virtualization techniques. Traditional acceleration approaches generally cannot capture intercomponent interactions due to unpredictable component simulation progress. Our approach leverages existing hardware virtualization framework and devises three key implementation techniques to achieve fast and accurate full-system simulations. First, our approach utilizes the virtualization framework trap mechanism and precisely intercepts intercomponent interactions with no need to check every data access, but effectively maintains deterministic chronological orders of intercomponent interactions. Second, VIRA provides very accurate system performance estimation for early system-level designs through effective integration of component timing models, interrupt effects, and bus contention analysis. Third, VIRA achieves near-real-time performance by having software and hardware simulated components executed on the same host machine to minimize the overhead of intercomponent data exchange. We implement the proposed approach on a virtualization-enabled off-the-shelf system-on-chip board to demonstrate the effectiveness of our idea. The experiments show that VIRA always produces deterministic results while running 58–625 times faster than a commercial tool, and the system performance estimation is only 3%–6% from real systems. Moreover, our deterministic full-system simulator is also verified to carry as little as 2%–57% overhead compared to ideal native executions on the same host hardware devices.

[1]  Luciano Lavagno,et al.  Software performance estimation strategies in a system-level design tool , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[2]  Rainer Leupers,et al.  Time-decoupled parallel SystemC simulation , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Gunar Schirner,et al.  Result-Oriented Modeling—A Novel Technique for Fast and Accurate TLM , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Andreas Gerstlauer,et al.  The next generation of virtual prototyping: Ultra-fast yet accurate simulation of HW/SW systems , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Håkan Grahn,et al.  ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems , 2010, IEEE Computer Architecture Letters.

[6]  Daniel Gajski,et al.  Cycle-approximate Retargetable Performance Estimation at the Transaction Level , 2008, 2008 Design, Automation and Test in Europe.

[7]  P. Ezudheen,et al.  Parallelizing SystemC Kernel for Fast Hardware Simulation on SMP Machines , 2009, 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation.

[8]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[9]  Moshe Zukerman,et al.  Introduction to Queueing Theory and Stochastic Teletraffic Models , 2013, ArXiv.

[10]  Daniel Gajski,et al.  Transaction level modeling: an overview , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[11]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[12]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[13]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[14]  Babak Falsafi,et al.  PROToFLEX: FPGA-accelerated Hybrid Functional Simulator , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[15]  Ren-Song Tsay,et al.  A distributed timing synchronization technique for parallel multi-core instruction-set simulation , 2013, ACM Trans. Embed. Comput. Syst..

[16]  Hsin-I Wu,et al.  A highly efficient full-system virtual prototype based on virtualization-assisted approach , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Nikil D. Dutt,et al.  Fast exploration of bus-based communication architectures at the CCATB abstraction , 2008, TECS.

[18]  Yeh-Ching Chung,et al.  PQEMU: A Parallel System Emulator Based on QEMU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[19]  Aditya Chopra,et al.  FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[20]  David A. Patterson,et al.  A case for FAME: FPGA architecture model execution , 2010, ISCA.

[21]  Luca Benini,et al.  Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting , 2012, GPGPU-5.

[22]  Takeshi Yoshimura,et al.  A fast hardware/software co-verification method for systern-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication , 2004, Proceedings. 41st Design Automation Conference, 2004..

[23]  Martin Radetzki,et al.  Efficient Parallel Transaction Level Simulation by Exploiting Temporal Decoupling , 2009, IESS.

[24]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[25]  Louise H. Crockett,et al.  The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc , 2014 .

[26]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[27]  Ren-Song Tsay,et al.  An activity-sensitive contention delay model for highly efficient deterministic full-system simulations , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Franco Fummi,et al.  Code Manipulation for Virtual Platform Integration , 2016, IEEE Transactions on Computers.

[29]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[30]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[31]  Jung Ho Ahn,et al.  How to simulate 1000 cores , 2009, CARN.

[32]  Martin Radetzki,et al.  A Dynamic Load Balancing Method for Parallel Simulation of Accuracy Adaptive TLMs , 2010, FDL.

[33]  Luca Benini,et al.  Bus Access Design for Combined Worst and Average Case Execution Time Optimization of Predictable Real-Time Applications on Multiprocessor Systems-on-Chip , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[34]  Matthieu Moy Parallel programming with SystemC for loosely timed models: A non-intrusive approach , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[35]  Nikos Nikoleris,et al.  Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed , 2015, 2015 IEEE International Symposium on Workload Characterization.

[36]  Haibo Chen,et al.  COREMU: a scalable and portable parallel full-system emulator , 2011, PPoPP '11.

[37]  Wayne H. Wolf,et al.  MediaBench II video: Expediting the next generation of video systems research , 2009, Microprocess. Microsystems.

[38]  Franco Fummi,et al.  SAGA: SystemC acceleration on GPU architectures , 2012, DAC Design Automation Conference 2012.

[39]  Jen-Chieh Yeh,et al.  A Formal Full Bus TLM Modeling for Fast and Accurate Contention Analysis , 2012 .

[40]  Che-Rung Lee,et al.  A Critical-Section-Level timing synchronization approach for deterministic multi-core instruction-set simulations , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[41]  Michael Adler,et al.  HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[42]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[43]  Martin Radetzki,et al.  Modelling Alternatives for Cycle Approximate Bus TLMs , 2007, FDL.

[44]  Ren-Song Tsay,et al.  Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model , 2009, 2009 Asia and South Pacific Design Automation Conference.

[45]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[46]  Hiren D. Patel,et al.  Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs , 2012, 17th Asia and South Pacific Design Automation Conference.

[47]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[48]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[49]  Gerald J. Popek,et al.  Formal requirements for virtualizable third generation architectures , 1974, SOSP '73.

[50]  Jason Nieh,et al.  KVM/ARM: the design and implementation of the linux ARM hypervisor , 2014, ASPLOS.

[51]  Chien-Min Wang,et al.  HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores , 2012, CGO '12.