PAAS: A system level simulator for heterogeneous computing architectures

Heterogeneous computing with hardware accelerators is a promising direction to overcome the power and performance walls in traditional computing systems. CPU-accelerator integrated architectures, such as CPU with ASIC or FPGA based accelerators, are able to provide customized processing according to application requirements and are thus particularly attractive to speed up computation-intensive applications. Therefore, system level simulation showing the interaction among CPUs, hardware accelerators and memory system precisely is important for performing design space exploration leading to architecture and design optimization. In this work, we present PAAS (Processor Accelerator Architecture Simulator), a system level simulator to enable cycle-accurate full system simulation of CPU-accelerator heterogeneous systems. PAAS can easily support flexible architectural configurations, such as different on-chip interconnection topologies, memory hierarchy, etc. Using PAAS, this paper also presents the analysis of the impact of different architectural configurations on the performance of benchmark applications with different execution characteristics using FPGA based accelerators. Furthermore, as an example showing the research capability of PAAS, this paper proposes and investigates a cache-partitioning scheme for improving the performance of shared-cache based CPU-FPGA systems.

[1]  Jason Cong,et al.  ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architecture (Abstact Only) , 2016, FPGA.

[2]  Franz Franchetti,et al.  Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.

[3]  Gu-Yeon Wei,et al.  Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Luca Benini,et al.  Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ , 2013 .

[5]  Yu Wang,et al.  Online scheduling for FPGA computation in the Cloud , 2014, 2014 International Conference on Field-Programmable Technology (FPT).

[6]  Jason Cong,et al.  ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures , 2016, ArXiv.

[7]  Stephen Neuendorffer,et al.  Building zynq® accelerators with Vivado® high level synthesis , 2013, FPGA '13.

[8]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[9]  David A. Wood,et al.  gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.

[10]  Wei Zhang,et al.  HeteroSim: A heterogeneous CPU-FPGA simulator , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[11]  Gilles Sassatelli,et al.  Accuracy evaluation of GEM5 simulator system , 2012, 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[12]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.