Full System Emulation of Embedded Heterogeneous Multicores Based on QEMU

The emerging edge computing is poised to move computing and intelligence to the network's edge so as to be close to the data sources for fast responses and reduced network traffic. In edge computing, edge devices need to encompass a wide variety of applications or services, from data preprocessing, intelligence inference, to multimedia human interface. Many such applications are well suited for special-purpose hardware accelerators. With the increasing number of accelerators on the edge devices, a promising architecture for edge devices is an asymmetric heterogeneous multicore that incorporates one or more microcontrollers to offload accelerator scheduling and interrupt handling from the main CPU, as exemplified in the NVIDIA Deep Learning Accelerator (NVDLA). To develop such computing systems, virtual platforms such as QEMU are often used. Unfortunately, QEMU only supports symmetric homogeneous multicore systems. In this paper, we tackle the challenging problem of supporting asymmetric heterogeneous multicore systems on QEMU by considering two possible implementation strategies: one-process and multi-process. The challenges are discussed and our implementations are presented. The two approaches are then compared qualitatively and quantitatively.

[1]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[2]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Taniguchi Ittetsu,et al.  Cache Simulation for Instruction Set Simulator QEMU , 2014 .

[4]  Bin-Da Liu,et al.  Dual-core virtual platform with QEMU and SystemC , 2010, 2010 International Symposium on Next Generation Electronics.

[5]  David A. Wood,et al.  gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.

[6]  Luca Benini,et al.  VirtualSoC: A Full-System Simulation Environment for Massively Parallel Heterogeneous System-on-Chip , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[7]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[8]  Ming-Chao Chiang,et al.  A QEMU and SystemC-Based Cycle-Accurate ISS for Performance Estimation on SoC Development , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Yunsup Lee,et al.  A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators , 2014, ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC).

[10]  David M. Beazley,et al.  The inside story on shared libraries and dynamic loading , 2001, Comput. Sci. Eng..

[11]  Chung-Ho Chen,et al.  Full system simulation with QEMU: An approach to multi-view 3D GPU design , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[12]  Jordi Carrabina,et al.  Mixed SW/SystemC SoC Emulation Framework , 2007, 2007 IEEE International Symposium on Industrial Electronics.