Exploring heterogeneous scheduling for edge computing with CPU and FPGA MPSoCs

Abstract This paper presents a framework targeted to low-cost and low-power heterogeneous MultiProcessors that exploits FPGAs and multicore CPUs, with the overarching goal of providing developers with a productive programming model and runtime support to fully use all the processing resources available. FPGA productivity is achieved using a high-level programming model based on OpenCL, the standard for cross-platform parallel heterogeneous programming. In this work, we focus on the parallel_for pattern, and as part of the runtime support for this pattern, we leverage a new scheduler that strives to maximize the number of iterations per joule by dynamically and adaptively partitioning the iteration space between the multicore and the accelerator when working simultaneously. A total of 7 benchmarks are ported and optimized for a low-cost DE1 board. The results show that the heterogeneous solution can improve performance up to 2.9 ×  and increases energy efficiency up to 2.7 ×  compared to the traditional approach of keeping all the CPU cores idle while the accelerator computes the workload. Our results also demonstrate two interesting insights: first, an adaptive scheduler able to find at runtime the right chunk size for each type of application and device configuration is an essential component for these kinds of heterogeneous platforms, and second, device configurations that provide higher throughput do not always achieve better energy efficiency when only the running power (excluding the idle power component) is considered.

[1]  Ioana Burcea,et al.  A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.

[2]  Theerayod Wiangtong,et al.  Heterogeneous Computing Platform for data processing , 2016, 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[3]  Wu-chun Feng,et al.  Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[4]  Yajun Ha,et al.  A heterogeneous platform with GPU and FPGA for power efficient high performance computing , 2014, 2014 International Symposium on Integrated Circuits (ISIC).

[5]  Jason P. Jue,et al.  All One Needs to Know about Fog Computing and Related Edge Computing Paradigms , 2019 .

[6]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[7]  Rafael Asenjo,et al.  Heterogeneous parallel_for Template for CPU–GPU Chips , 2018, International Journal of Parallel Programming.

[8]  Giuseppe Lipari,et al.  Energy-efficient scheduling for moldable real-time tasks on heterogeneous computing platforms , 2017, J. Syst. Archit..

[9]  R. Govindarajan,et al.  Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.

[10]  Luis Piñuel,et al.  A power measurement environment for PCIe accelerators , 2014, Computer Science - Research and Development.

[11]  Rafael Asenjo,et al.  Simultaneous multiprocessing in a software-defined heterogeneous FPGA , 2018, The Journal of Supercomputing.

[12]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Pingfan Meng,et al.  FPGA-GPU-CPU heterogenous architecture for real-time cardiac physiological optical mapping , 2012, 2012 International Conference on Field-Programmable Technology.

[14]  Wei Zhang,et al.  A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Keshav Pingali,et al.  Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[16]  David F. Bacon,et al.  FPGA programming for the masses , 2013, CACM.

[17]  W. Luk,et al.  Axel: a heterogeneous cluster with FPGAs and GPUs , 2010, FPGA '10.

[18]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[19]  Constantine D. Polychronopoulos,et al.  An efficient message-passing scheduler based on guided self scheduling , 1989, ICS '89.

[20]  Marco D. Santambrogio,et al.  On How to Improve FPGA-Based Systems Design Productivity via SDAccel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[22]  Michael Voss,et al.  Pro TBB , 2019, Apress.

[23]  Rafael Asenjo,et al.  Mapping Streaming Applications on Commodity Multi-CPU and GPU On-Chip Processors , 2016, IEEE Transactions on Parallel and Distributed Systems.

[24]  Richard W. Vuduc,et al.  Improving the energy efficiency of Big Cores , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[25]  Enrique S. Quintana-Ortí,et al.  An Integrated Framework for Power-Performance Analysis of Parallel Scientific Workloads , 2013 .

[26]  Rafael Asenjo,et al.  Adaptive Partitioning for Irregular Applications on Heterogeneous CPU-GPU Chips , 2015, ICCS.

[27]  Xiangyu Li,et al.  Hetero-mark, a benchmark suite for CPU-GPU collaborative computing , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[28]  Antonio J. Peña,et al.  Chai: Collaborative heterogeneous applications for integrated-architectures , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).