Analysis of a heterogeneous multi-core, multi-hw-accelerator-based system designed using PREESM and SDSoC

Nowadays, new heterogeneous system technologies are flooding the market: through the past years, it is possible to observe the move from single CPUs to multi-core devices featuring CPUs, GPUs and large FPGAs, such as Xilinx Zynq-7000 or Zynq UltraScale+ MPSoC architectures. In this context, providing developers with transparent deployment capabilities to efficiently execute different applications on such complex devices is important. In this paper, a design flow that combines, on one side, PREESM, a dataflow-based prototyping framework and, on the other side, Xilinx SDSoC, an HLS-based framework to automatically generate and manage hardware accelerators, is presented. This integration leverages the automatic, static task scheduling obtained from PREESM with asynchronous invocations that trigger the parallel execution of multiple hardware accelerators from some of their associated sequential software threads. An image processing application is used as a proof of concept, showing the interoperability possibilities of both tools, the level of design automation achieved and, for the resulting computing architecture, the good performance scalability according to the number of accelerators and sw threads.

[1]  Guan Wang,et al.  A Novel Heterogeneous Scheduling Algorithm with Improved Task Priority , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[2]  Hadi Shahriar Shahhoseini,et al.  A New approach in on-line task scheduling for reconfigurable computing systems , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[3]  John Sartori,et al.  Exploiting Dynamic Timing Slack for Energy Efficiency in Ultra-Low-Power Embedded Systems , 2016, ISCA.

[4]  Durga Toshniwal,et al.  Parameterized Module Scheduling Algorithm for Reconfigurable Computing Systems , 2007 .

[5]  Ying Wang,et al.  On-line scheduling of real-time tasks for reconfigurable computing system , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[6]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[7]  Krithi Ramamritham,et al.  Scheduling Tasks with Resource Requirements in Hard Real-Time Systems , 1987, IEEE Transactions on Software Engineering.

[8]  Arvind Rajawat,et al.  A Case Study: Task Scheduling Methodologies for High Speed Computing Systems , 2015, ArXiv.

[9]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[10]  Youngsoo Kim,et al.  Dataflow to Hardware Synthesis Framework on FPGAs , 2016, 2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW).

[11]  Reiner W. Hartenstein,et al.  The microprocessor is no longer general purpose: why future reconfigurable platforms will win , 1997, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon.

[12]  Joseph Y.-T. Leung,et al.  On-Line Scheduling of Real-Time Tasks , 1992, IEEE Trans. Computers.

[13]  Yong Dou,et al.  Loop Kernel Pipelining Mapping onto Coarse-Grained Reconfigurable Architecture for Data-Intensive Applications , 2009, J. Softw..

[14]  Di Liu,et al.  Energy-efficient mapping of real-time applications on heterogeneous MPSoCs using task replication , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[15]  Eduardo de la Torre,et al.  Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[16]  Sagheer Ahmad,et al.  UltraScale+ MPSoC and FPGA families , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[17]  Maxime Pelcat,et al.  Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[18]  B. Nath,et al.  Frequency Count Based Filter for Dimensionality Reduction , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[19]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[20]  Marco D. Santambrogio,et al.  Resource-Efficient Scheduling for Partially-Reconfigurable FPGA-Based Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21]  John P. Lehoczky,et al.  The rate monotonic scheduling algorithm: exact characterization and average case behavior , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[22]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[23]  Jürgen Teich,et al.  A Dynamic Scheduling and Placement Algorithm for Reconfigurable Hardware , 2004, ARCS.

[24]  Diana Göhringer,et al.  Enabling dynamic and partial reconfiguration in Xilinx SDSoC , 2016, 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[25]  Kuldip Singh,et al.  An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[26]  Nitin Auluck,et al.  Real time scheduling on heterogeneous multiprocessor systems — A survey , 2016, 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC).

[27]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[28]  Stoddard Aaron,et al.  High-speed PCAP configuration scrubbing on Zynq-7000 All Programmable SoCs , 2016 .

[29]  Shashank Pujari,et al.  Migration from microcontroller to FPGA based SoPC design: Case study: LMS adaptive filter design on Xilinx Zynq FPGA with embedded ARM controller , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[30]  Marco Platzner,et al.  A Heuristic Approach to Schedule Periodic Real-Time Tasks on Reconfigurable Hardware , 2005, FPL.

[31]  Dimitrios Soudris,et al.  Energy profile analysis of Zynq-7000 programmable SoC for embedded medical processing: Study on ECG arrhythmia detection , 2016, 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[32]  Jean-François Nezan,et al.  PiMM: Parameterized and Interfaced dataflow Meta-Model for MPSoCs runtime reconfiguration , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).