PAPIFY: Automatic Instrumentation and Monitoring of Dynamic Dataflow Applications Based on PAPI

The widening of the complexity-productivity gap in application development witnessed in the last years is becoming an important issue for the developers. New design methods try to automate most designers tasks to bridge this gap. In addition, new Model of Computations (MoCs), as those dataflow-based, ease the expression of parallelism within applications, leading to higher designer productivity. Rapid prototyping design tools offer fast estimations of the soundness of design choices. A key step when prototyping an application is to have representative performance indicators to estimate the validity of those design choices. Such indicators can be obtained using hardware information, while new libraries, e.g.,Performance Application Programming Interface (PAPI), ease the access to such hardware information. In this work, Papify toolbox is presented as a tool to perform automatic PAPI-based instrumentation of dynamic dataflow applications. It combines Papify with a dataflow Y-chart based design framework, which is called Preesm, and its companion run-time reconfiguration manager, which is called Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPiDER). Papify toolbox accounts for an automatic code generator for static and dynamic applications, a dedicated library to manage the monitoring at run-time and two User Interfaces (UIs) to ease both the configuration and the analysis of the captured run-time information. Additionally, its main advantages are 1) its capability of adapting the monitoring according to the system status and 2) adaptation of the monitoring accordingly to application workload redistribution in run-time. A thorough overhead characterization using Sobel-morpho and Stereo-matching dataflow applications shows that Papify run-time monitoring overhead is up to 10%.

[1]  Mickaël Raulet,et al.  Energy estimation models for video decoders: reconfigurable video coding-CAL case-study , 2015, IET Comput. Digit. Tech..

[2]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[3]  Bernd Mohr,et al.  Profiling Hybrid HMPP Applications with Score-P on Heterogeneous Hardware , 2013, PARCO.

[4]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[5]  Patricia J. Teller,et al.  Just how accurate are performance counters? , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[6]  Jean-François Nezan,et al.  PiMM: Parameterized and Interfaced dataflow Meta-Model for MPSoCs runtime reconfiguration , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[7]  Maxime Pelcat,et al.  Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[8]  Isaac D. Scherson,et al.  Computationally Efficient Multiplexing of Events on Hardware Counters , 2014 .

[9]  Johan Lilius,et al.  Task-based execution of synchronous dataflow graphs for scalable multicore computing , 2017, 2017 IEEE International Workshop on Signal Processing Systems (SiPS).

[10]  Jean-François Nezan,et al.  Demonstrating a dataflow-based RTOS for heterogeneous MPSoC by means of a stereo matching application , 2014, Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing.

[11]  Jack J. Dongarra,et al.  Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[12]  Edward A. Lee Cyber Physical Systems: Design Challenges , 2008, 2008 11th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC).

[13]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[14]  Luigi Raffo,et al.  Cross-layer design of reconfigurable cyber-physical systems , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[15]  Ed F. Deprettere,et al.  A Methodology to Design Programmable Embedded Systems - The Y-Chart Approach , 2001, Embedded Processor Design Challenges.

[16]  Maxime Pelcat,et al.  Spider: A Synchronous Parameterized and Interfaced Dataflow-based RTOS for multicore DSPS , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[17]  Jörn W. Janneck,et al.  TURNUS: A design exploration framework for dataflow system design , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[18]  Mickaël Raulet,et al.  Orcc: multimedia development made easy , 2013, MM '13.

[19]  Jack Dongarra,et al.  Using PAPI for Hardware Performance Monitoring on Linux Systems , 2001 .

[20]  Eduardo Juárez Martínez,et al.  Automatic instrumentation of dataflow applications using PAPI , 2018, CF.

[21]  Maxime Pelcat,et al.  Analysis of a heterogeneous multi-core, multi-hw-accelerator-based system designed using PREESM and SDSoC , 2017, 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[22]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[23]  Eduardo Juárez Martínez,et al.  A PMC-driven methodology for energy estimation in RVC-CAL video codec specifications , 2013, Signal Process. Image Commun..