The Multi-Dataflow Composer tool: An open-source tool suite for optimized coarse-grain reconfigurable hardware accelerators and platform design

Abstract Modern embedded and cyber-physical systems require every day more performance, power efficiency and flexibility, to execute several profiles and functionalities targeting the ever growing adaptivity needs and preserving execution efficiency. Such requirements pushed designers towards the adoption of heterogeneous and reconfigurable substrates, which development and management is not that straightforward. Despite acceleration and flexibility are desirable in many domains, the barrier of hardware deployment and operation is still there since specific advanced expertise and skills are needed. Related challenges are effectively tackled by leveraging on automation strategies that in some cases, as in the proposed work, exploit model-based approaches. This paper is focused on the Multi-Dataflow Composer (MDC) tool, that intends to solve issues related to design, optimization and operation of coarse-grain reconfigurable hardware accelerators and their easy adoption in modern heterogeneous substrates. MDC latest features and improvements are introduced in detail and have been assessed on the so far unexplored robotics application field. A multi-profile trajectory generator for a robotic arm is implemented over a Xilinx FPGA board to show in which cases coarse-grain reconfiguration can be applied and which can be the parameters and trade-offs MDC will allow users to play with.

[1]  Jorn W. Janneck,et al.  Synthesis and optimization of high-level stream programs , 2013, Proceedings of the 2013 Electronic System Level Synthesis Conference (ESLsyn).

[2]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[3]  Paolo Meloni,et al.  Reconfigurable coprocessors synthesis in the MPEG-RVC domain , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[4]  Luigi Raffo,et al.  Cross-layer design of reconfigurable cyber-physical systems , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[5]  Luigi Raffo,et al.  Hardware/Software Self-adaptation in CPS: The CERBERO Project Approach , 2019, SAMOS.

[6]  Luigi Raffo,et al.  Challenging CPS Trade-off Adaptivity with Coarse-Grained Reconfiguration , 2017, ApplePies.

[7]  Paolo Meloni,et al.  Challenging the Best HEVC Fractional Pixel FPGA Interpolators With Reconfigurable and Multifrequency Approximate Computing , 2017, IEEE Embedded Systems Letters.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Shuvra S. Bhattacharyya,et al.  A Lightweight Dataflow Approach for Design and Implementation of SDR Systems , 2010 .

[10]  Leonardo Suriano,et al.  A Dataflow Implementation of Inverse Kinematics on Reconfigurable Heterogeneous MPSoC , 2019, CPS Summer School, PhD Workshop.

[11]  Wayne Luk,et al.  Reconfigurable computing: architectures and design methods , 2005 .

[12]  Paolo Meloni,et al.  Power-Awarness in Coarse-Grained Reconfigurable Multi-Functional Architectures: a Dataflow Based Strategy , 2017, J. Signal Process. Syst..

[13]  Luigi Raffo,et al.  Reconfigurable Coprocessor for Multimedia Application Domain , 2006, J. VLSI Signal Process..

[14]  Jean-François Nezan,et al.  PiMM: Parameterized and Interfaced dataflow Meta-Model for MPSoCs runtime reconfiguration , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[15]  Twan Basten,et al.  The FitOptiVis ECSEL project: highly efficient distributed embedded image/video processing in cyber-physical systems , 2019, CF.

[16]  Paolo Meloni,et al.  Automated Design Flow for Multi-Functional Dataflow-Based Platforms , 2016, J. Signal Process. Syst..

[17]  Yu Ting Chen,et al.  A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  John Lach,et al.  Highly Flexible Multimode Digital Signal Processing Systems Using Adaptable Components and Controllers , 2006, EURASIP J. Adv. Signal Process..

[19]  Alexandru Turjan,et al.  System design using Khan process networks: the Compaan/Laura approach , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[20]  Nikil D. Dutt,et al.  Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable Arrays , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  R. Woods,et al.  Synthesis and high level optimisation of multidimensional dataflow actor networks on FPGA , 2004, IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004..

[22]  Luigi Raffo,et al.  CERBERO: Cross-layer modEl-based fRamework for multi-oBjective dEsign of reconfigurable systems in unceRtain hybRid envirOnments: Invited paper: CERBERO teams from UniSS, UniCA, IBM Research, TASE, INSA-Rennes, UPM, USI, Abinsula, AmbieSense, TNO, S&T, CRF , 2019, CF.

[23]  Lei Liu,et al.  ProDFA: Accelerating Domain Applications with a Coarse-Grained Runtime Reconfigurable Architecture , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[24]  Luigi Raffo,et al.  Dataflow-Functional High-Level Synthesis for Coarse-Grained Reconfigurable Accelerators , 2019, IEEE Embedded Systems Letters.

[25]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[26]  Brian Jeff Advances in big.LITTLE Technology for Power and Energy Savings Improving Energy Efficiency in High-Performance Mobile Platforms , 2012 .

[27]  Rainer Leupers,et al.  Handbook of Signal Processing Systems , 2010 .

[28]  Luigi Raffo,et al.  Automated power gating methodology for dataflow-based reconfigurable systems , 2015, Conf. Computing Frontiers.

[29]  Mickaël Raulet,et al.  Overview of the MPEG Reconfigurable Video Coding Framework , 2011, J. Signal Process. Syst..

[30]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[31]  Dominik Macko Contribution to Automated Generating of System Power-Management Specification , 2018, 2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[32]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[33]  Paolo Meloni,et al.  Modelling and Automated Implementation of Optimal Power Saving Strategies in Coarse-Grained Reconfigurable Architectures , 2016, J. Electr. Comput. Eng..

[34]  Luigi Raffo,et al.  Run-time Performance Monitoring of Heterogenous Hw/Sw Platforms Using PAPI , 2021, ArXiv.

[35]  Indrani Paul,et al.  A comparison of core power gating strategies implemented in modern hardware , 2014, SIGMETRICS '14.

[36]  Luigi Raffo,et al.  DSE and profiling of multi-context coarse-grained reconfigurable systems , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[37]  François Berry,et al.  CAPH: a language for implementing stream-processing applications on FPGAs , 2013 .

[38]  Sharad Malik,et al.  Datapath merging and interconnection sharing for reconfigurable architectures , 2002, 15th International Symposium on System Synthesis, 2002..

[39]  Paolo Meloni,et al.  Power and clock gating modelling in coarse grained reconfigurable systems , 2016, Conf. Computing Frontiers.

[40]  Ilya Klotchkov,et al.  Power specification, simulation and verification of SystemC designs , 2016, 2016 IEEE East-West Design & Test Symposium (EWDTS).

[41]  Jocelyn Sérot,et al.  High-level dataflow programming for real-time image processing on smart cameras , 2014, Journal of Real-Time Image Processing.

[42]  Nicolas Berthier,et al.  Exercising Symbolic Discrete Control for Designing Low-power Hardware Circuits: an Application to Clock-gating , 2018 .

[43]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[44]  Jörn W. Janneck,et al.  High-level synthesis of dataflow programs for signal processing systems , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[45]  Massoud Pedram,et al.  Power minimization in IC design: principles and applications , 1996, TODE.

[46]  Marco Mattavelli,et al.  Clock-Gating of Streaming Applications for Energy Efficient Implementations on FPGAs , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[47]  S. Buss Introduction to Inverse Kinematics with Jacobian Transpose , Pseudoinverse and Damped Least Squares methods , 2004 .

[48]  Massoud Pedram,et al.  Clock-gating and its application to low power design of sequential circuits , 2000 .

[49]  Maxime Pelcat,et al.  Spider: A Synchronous Parameterized and Interfaced Dataflow-based RTOS for multicore DSPS , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[50]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[51]  Jörn W. Janneck,et al.  TURNUS: A design exploration framework for dataflow system design , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[52]  Cid C. de Souza,et al.  The datapath merging problem in reconfigurable systems: Complexity, dual bounds and heuristic evaluation , 2005, JEAL.

[53]  Samuel R. Buss,et al.  Selectively Damped Least Squares for Inverse Kinematics , 2005, J. Graph. Tools.

[54]  Eduardo de la Torre,et al.  FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework , 2018, Sensors.

[55]  Stijn Eyerman,et al.  Fine-grained DVFS using on-chip regulators , 2011, TACO.

[56]  Yan Zhang,et al.  Clock-Gating in FPGAs: A Novel and Comparative Evaluation , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[57]  Luigi Raffo,et al.  Multi-Grain Reconfiguration for Advanced Adaptivity in Cyber-Physical Systems , 2018, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[58]  Maxime Pelcat,et al.  Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[59]  Luciano Lavagno,et al.  LP-HLS: Automatic power-intent generation for high-level synthesis based hardware implementation flow , 2017, Microprocess. Microsystems.

[60]  Claudio Rubattu Dataflow-based Adaptation Framework with Coarse-Grained Reconfigurable Accelerators , 2019 .

[61]  Mickaël Raulet,et al.  A codesign synthesis from an MPEG-4 decoder dataflow description , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[62]  Florian Arrestier,et al.  PAPIFY: Automatic Instrumentation and Monitoring of Dynamic Dataflow Applications Based on PAPI , 2019, IEEE Access.

[63]  Russell Tessier,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Reconfigurable Computing for Digital Signal Processing: A Survey ∗ , 1999 .