In the era of Cyber Physical Systems, designers need to offer support for run-time adaptivity considering different constraints, including the internal status of the system. This work presents a run-time monitoring approach, based on the Performance Application Programming Interface, that offers a unified interface to transparently access both the standard Performance Monitoring Counters (PMCs) in the CPUs and the custom ones integrated into hardware accelerators. Automatic tools offer to Sw programmers the support to design and implement Coarse-Grain Virtual Reconfigurable Circuits, instrumented with custom PMCs. This approach has been validated on a heterogeneous application for image/video processing with an overhead of 6% of the execution time. 1 Context and Objectives Cyber-Physical Systems (CPS) are complex systems, composed of different components characterized by a strong interaction with environment and users. In particular, they need to adapt their behaviour according to the environment, any user requests and also their internal status [1]. The H2020 CERBERO European Project [2, 3] is developing a continuous design environment for CPS, relying on a set of tools developed by project partners. Effective support for run-time adaptation in heterogeneous systems, taking into account a plethora of different internal and external triggers, is among the CERBERO expected outcomes, and a fundamental step is monitoring the hardware (Hw) and software (Sw) elements of the heterogeneous system [4]. This paper focuses on one fundamental step necessary to design self-adaptive systems: the monitoring of heterogeneous architectures, where processing cores are connected to custom hardware accelerators that can be reconfigured at run-time. One of the Hw reconfigurable infrastructures supported in CERBERO is the Coarse-Grain Virtual Reconfigurable Circuits (CG-VRCs) [5]. CG-VRCs offer fast and low power reconfiguration, with a good trade-off between performance and flexibility, being suitable for providing run-time Hw adaptation. In these kinds of systems, all the resources belonging to all the configurations are instantiated in the substrate and different configurations are enabled by multiplexing resources in time [6], they can be implemented on both Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC) systems. These kinds of accelerators are suited to support: 1. Functional oriented adaptivity: the application is able to execute different functionalities over the same substrate (e.g., algorithm changes) [7]. 2. Non-functional oriented adaptivity: the application is able to execute only one functionality, but with different performance (e.g., the precision of a filter could be reduced to save energy) [8]. In CERBERO, the Multi-Dataflow Composer (MDC) [9] tool automates the development of CG-VRCs. Users describe the applications to be accelerated as dataflows and MDC automatically merges them through a datapath merging algorithm, generating a Xilinx-compliant IP with its drivers to delegate computing tasks to the coprocessor [10]. The first step to enable a feedback loop that allows for the design of self-adaptive CPS, consists of instrumenting the system with monitors to capture its internal status changes [4]. The most extended Sw approach for enabling self-awareness is based on accessing the existing Performance Monitoring Counters (PMCs) of modern CPUs. On the other hand, a Hw accelerator can be specialized by the designer to include custom monitors. This second solution is not suitable for Sw developers who may have limited knowledge of the Hw design flow. Furthermore, if these solutions rely on custom methods to read the monitors, the process of reading the monitors in the Hw accelerators and the PMCs already available on the CPU could not be the same, and heterogeneity of solutions, complex to be implemented, may be required. In CERBERO, PAPIFY [11, 12] provides a lightweight monitoring infrastructure by means of an event library aimed at generalizing the Performance Application Programming Interface (PAPI) [13] for embedded heterogeneous architectures. In a previous work [14] we proposed the idea of using PAPIFY in combination with MDC to offer support for the ar X iv :2 10 3. 01 19 5v 1 [ cs .A R ] 1 M ar 2 02 1 design, implementation and monitoring of run-time reconfigurable systems, as the CG-VRCs, using PAPIFY. In that work we presented a PAPI-compliant component that could be automatically configured with events information using an XML file. The work presented in this paper relies on the idea of offering to Sw developers the support to design and implement run-time reconfigurable systems and to monitor both the processor and the Hw accelerator using a unified methodology based on PAPIFY. Being in a heterogeneous-core computing era, a unified methodology allows a fairer comparison of Hw and Sw performance and facilitates the performance analysis in terms of debugging (e.g., monitor the correct execution of internal modules) and optimization (e.g., monitoring of CG-VRC allows for prospectively switching among different configuration if the users require better performance). • In this work the MDC tool has been extended to provide automatic instrumentation of the CG-VRCs with custom PMCs and to automatically generate the XML file necessary to automatically configure the previous developed PAPI-component. This automatic flow allows Sw programmers to define the applications to be accelerated and instrumented as dataflow descriptions, without the need of any Hw knowledge. • The Application Programming Interfaces (APIs) provided by MDC, in combination with the Sw libraries provided by PAPIFY, offer the transparent PAPI-compliant access to the Hw PMCs. • The monitoring of heterogeneous Hw/Sw systems is a mandatory step to allow self-adaptation of CPS. Nevertheless, in this preliminary exploration the design under test is not a CPS one. Assessment on a processorcoprocessor system for image processing, validates the automatic design flow, the monitoring PAPI-based approach and the effectiveness of PAPIFY on heterogeneous Hw/Sw systems. The paper is organized as follows: Section 2 explores the solutions at the state of the art, Section 3 presents the proposed Hw/Sw unified monitoring approach together with the exploited tools, and Section 4 presents a proof of concept evaluation of the effectiveness of the approach. At the end, Section 5 summarizes and concludes the paper with some directions for future works.
[1]
Paolo Meloni,et al.
Reconfigurable coprocessors synthesis in the MPEG-RVC domain
,
2015,
2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).
[2]
Luigi Raffo,et al.
Hardware/Software Self-adaptation in CPS: The CERBERO Project Approach
,
2019,
SAMOS.
[3]
Luigi Raffo,et al.
Challenging CPS Trade-off Adaptivity with Coarse-Grained Reconfiguration
,
2017,
ApplePies.
[4]
Paolo Meloni,et al.
Challenging the Best HEVC Fractional Pixel FPGA Interpolators With Reconfigurable and Multifrequency Approximate Computing
,
2017,
IEEE Embedded Systems Letters.
[5]
Paolo Meloni,et al.
Power-Awarness in Coarse-Grained Reconfigurable Multi-Functional Architectures: a Dataflow Based Strategy
,
2017,
J. Signal Process. Syst..
[6]
Florian Arrestier,et al.
PAPIFY: Automatic Instrumentation and Monitoring of Dynamic Dataflow Applications Based on PAPI
,
2019,
IEEE Access.
[7]
Maxime Pelcat,et al.
Spider: A Synchronous Parameterized and Interfaced Dataflow-based RTOS for multicore DSPS
,
2014,
2014 6th European Embedded Design in Education and Research Conference (EDERC).
[8]
Tiziana Fanni,et al.
Run-time performance monitoring of hardware accelerators: POSTER
,
2019,
CF.
[9]
Reiner W. Hartenstein,et al.
Coarse grain reconfigurable architecture (embedded tutorial)
,
2001,
ASP-DAC '01.
[10]
Eduardo Juárez Martínez,et al.
Automatic instrumentation of dataflow applications using PAPI
,
2018,
CF.
[11]
Leonardo Suriano,et al.
A Unified Hardware/Software Monitoring Method for Reconfigurable Computing Architectures Using PAPI
,
2018,
2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).
[12]
Luigi Pomante,et al.
Hardware performance sniffers for embedded systems profiling
,
2015,
2015 12th International Workshop on Intelligent Solutions in Embedded Systems (WISES).
[13]
Eduardo de la Torre,et al.
FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework
,
2018,
Sensors.
[14]
Maxime Pelcat,et al.
Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming
,
2014,
2014 6th European Embedded Design in Education and Research Conference (EDERC).
[15]
Luigi Pomante,et al.
A Flexible Profiling Sub-System for Reconfigurable Logic Architectures
,
2016,
2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[16]
Ron Sass,et al.
HwPMI: An Extensible Performance Monitoring Infrastructure for Improving Hardware Design and Productivity on FPGAs
,
2012,
Int. J. Reconfigurable Comput..
[17]
Marco Platzner,et al.
A hardware/software infrastructure for performance monitoring on LEON3 multicore platforms
,
2014,
2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[18]
Luigi Raffo,et al.
Dataflow-Functional High-Level Synthesis for Coarse-Grained Reconfigurable Accelerators
,
2019,
IEEE Embedded Systems Letters.
[19]
François Berry,et al.
CAPH: a language for implementing stream-processing applications on FPGAs
,
2013
.
[20]
Panganamala Ramana Kumar,et al.
Cyber–Physical Systems: A Perspective at the Centennial
,
2012,
Proceedings of the IEEE.
[21]
Guillaume Patrigeon,et al.
FPGA-Based Platform for Fast Accurate Evaluation of Ultra Low Power SoC
,
2018,
2018 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS).
[22]
Matthias S. Müller,et al.
The Vampir Performance Analysis Tool-Set
,
2008,
Parallel Tools Workshop.
[23]
Luigi Raffo,et al.
Cross-layer design of reconfigurable cyber-physical systems
,
2017,
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[24]
Lesley Shannon,et al.
Performance monitoring for multicore embedded computing systems on FPGAs
,
2015,
ArXiv.
[25]
Nathan R. Tallent,et al.
HPCTOOLKIT: tools for performance analysis of optimized parallel programs
,
2010,
Concurr. Comput. Pract. Exp..
[26]
E. R. Davies,et al.
Circularity - a new principle underlying the design of accurate edge orientation operators
,
1984,
Image Vis. Comput..
[27]
Luigi Raffo,et al.
Multi-Grain Reconfiguration for Advanced Adaptivity in Cyber-Physical Systems
,
2018,
2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).
[28]
Henk Corporaal,et al.
Coarse grained reconfigurable architectures in the past 25 years: Overview and classification
,
2016,
2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).