DAMHSE: Programming heterogeneous MPSoCs with hardware acceleration using dataflow-based design space exploration and automated rapid prototyping

Abstract Heterogeneous Multiprocessor Systems-on-a-Chip (MPSoCs) with programmable hardware acceleration are currently gaining market share in the embedded device domain. Largest MPSoCs combine several software processing cores with programmable logic. In these systems, reaching the optimal implementation performance is difficult because many manual and time-consuming steps are required to build, from the application specification, a prototype with measurable performances. In this paper, a method is developed that, based on state-of-the-art tools and High-Level Synthesis, deploys within less than an hour a whole hardware-software rapid prototype from a unique dataflow-based application representation: DAMHSE (DAtaflow Method for Hardware/Software Exploration). A human-driven Design Space Exploration (DSE) is conducted in order to find the most performing architectural solution, and compilable/synthesizable code is generated. The method has been tested on an image processing system with software and hardware parallelism. Results show that the obtained absolute performance (pixel/cycles) reaches state-of-the-art, and that DAMHSE leads to a heterogeneous system where performance increases significantly when the application is granted with more hardware resources. One of the greatest challenges in creating such a design automation method resides in the application behavior that may change over time and affect application concurrency and system performance. In order to overcome this problem, the design-time DAtaflow Method for Hardware/Software Exploration (DAMHSE) method is complemented with a runtime application management system that dynamically dispatches jobs (tasks) among the available processing elements (CPUs and/or hardware accelerators). Experimental results show that the performance penalty due to runtime application mapping and scheduling is limited and that the computational performance of the adaptive system remains high. Apart from the vendor-specific HLS, the tools and the frameworks used by the proposed method are open source and tutorials are available to reproduce the results.

[1]  Eduardo de la Torre,et al.  Using SRAM Based FPGAs for Power-Aware High Performance Wireless Sensor Networks , 2012, Sensors.

[2]  Thilo Pionteck,et al.  Design space exploration for a hardware-accelerated embedded real-time pose estimation using vivado HLS , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[3]  Ishfaq Ahmad,et al.  High-performance algorithms of compile-time scheduling of parallel processors , 1997 .

[4]  Marco D. Santambrogio,et al.  A runtime controller for openCL applications on heterogeneous system architectures , 2018, SIGBED.

[5]  Pritee Khanna,et al.  A FPGA based implementation of Sobel edge detection , 2018, Microprocess. Microsystems.

[6]  Noël Plouzeau,et al.  Self-adaptation in software-intensive cyber-physical systems: From system goals to architecture configurations , 2016, J. Syst. Softw..

[7]  Soonhoi Ha,et al.  Extended Synchronous Dataflow for Efficient DSP System Prototyping , 2002, Des. Autom. Embed. Syst..

[8]  Victor Cheng,et al.  Novel OpenVX implementation for heterogeneous multi-core systems , 2017, 2017 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia).

[9]  Nattha Jindapetch,et al.  SDSoC based development of vehicle counting system using adaptive background method , 2017, 2017 IEEE Regional Symposium on Micro and Nanoelectronics (RSM).

[10]  Ying Wang,et al.  SPREAD: A Streaming-Based Partially Reconfigurable Architecture and Programming Model , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Lesley Shannon,et al.  FUSE: Front-End User Framework for O/S Abstraction of Hardware Accelerators , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[12]  Danny Weyns,et al.  MAPE-K Formal Templates to Rigorously Design Behaviors for Self-Adaptive Systems , 2015, ACM Trans. Auton. Adapt. Syst..

[13]  Jean-François Nezan,et al.  PiMM: Parameterized and Interfaced dataflow Meta-Model for MPSoCs runtime reconfiguration , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[14]  Andy D. Pimentel,et al.  Exploring Exploration: A Tutorial Introduction to Embedded Systems Design Space Exploration , 2017, IEEE Design & Test.

[15]  Leonardo Suriano,et al.  A Unified Hardware/Software Monitoring Method for Reconfigurable Computing Architectures Using PAPI , 2018, 2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[16]  Alain Girault,et al.  Adaptive Mapping for Multiple Applications on Parallel Architectures , 2017, UNet.

[17]  Bruno Raffin,et al.  XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[18]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[19]  Florian Arrestier,et al.  PAPIFY: Automatic Instrumentation and Monitoring of Dynamic Dataflow Applications Based on PAPI , 2019, IEEE Access.

[20]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[21]  Bernd Klauer,et al.  Operating System Concepts for Reconfigurable Computing: Review and Survey , 2016, Int. J. Reconfigurable Comput..

[22]  Sander Stuijk,et al.  A scenario-aware data flow model for combined long-run average and worst-case performance analysis , 2006, Fourth ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2006. MEMOCODE '06. Proceedings..

[23]  Steffen Paul,et al.  Fast digital design space exploration with high-level synthesis: A case study with approximate conjugate gradient pursuit , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[24]  Jürgen Teich,et al.  Hardware/Software Codesign: The Past, the Present, and Predicting the Future , 2012, Proceedings of the IEEE.

[25]  Dionisios N. Pnevmatikatos,et al.  Hardware Task Scheduling for Partially Reconfigurable FPGAs , 2015, ARC.

[26]  Marco Platzner,et al.  A self-adaptive heterogeneous multi-core architecture for embedded real-time video object tracking , 2011, Journal of Real-Time Image Processing.

[27]  Maxime Pelcat,et al.  Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[28]  Bertrand Le Gal,et al.  Design space exploration for partially reconfigurable architectures in real-time systems , 2013, J. Syst. Archit..

[29]  Maxime Pelcat,et al.  Analysis of a heterogeneous multi-core, multi-hw-accelerator-based system designed using PREESM and SDSoC , 2017, 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[30]  Woongki Baek,et al.  CHRT: A criticality- and heterogeneity-aware runtime system for task-parallel applications , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[31]  Marco Platzner,et al.  ReconOS: An Operating System Approach for Reconfigurable Computing , 2014, IEEE Micro.

[32]  Hassan Mostafa,et al.  Performance evaluation of turbo encoder implementation on a heterogeneous FPGA-CPU platform using SDSoC , 2017, 2017 Intl Conf on Advanced Control Circuits Systems (ACCS) Systems & 2017 Intl Conf on New Paradigms in Electronics & Information Technology (PEIT).

[33]  George Exarchakos,et al.  Runtime Reconfiguration in Networked Embedded Systems , 2016 .

[34]  Wolfram Schulte,et al.  An Approach for Effective Design Space Exploration , 2010, Monterey Workshop.

[35]  Ki-Seok Chung,et al.  Implementation of an LDPC decoder on a heterogeneous FPGA-CPU platform using SDSoC , 2016, 2016 IEEE Region 10 Conference (TENCON).

[36]  Laxmikant V. Kalé,et al.  Runtime Coordinated Heterogeneous Tasks in Charm++ , 2016, 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2).

[37]  Alessandro Cilardo,et al.  Design space exploration for high-level synthesis of multi-threaded applications , 2013, J. Syst. Archit..

[38]  François Duhem,et al.  FoRTReSS: a flow for design space exploration of partially reconfigurable systems , 2015, Des. Autom. Embed. Syst..

[39]  Lingli Wang,et al.  A moving object extraction and classification system based on Zynq and IBM SuperVessel , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[40]  Maxime Pelcat,et al.  Spider: A Synchronous Parameterized and Interfaced Dataflow-based RTOS for multicore DSPS , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[41]  Mita Nasipuri,et al.  A Fast FPGA Based Architecture for Sobel Edge Detection , 2012, VDAT.