Predictive Resource Management for Next-Generation High-Performance Computing Heterogeneous Platforms

High-Performance Computing (HPC) is rapidly moving towards the adoption of nodes characterized by an heterogeneous set of processing resources. This has already shown benefits in terms of both performance and energy efficiency. On the other side, heterogeneous systems are challenging from the application development and the resource management perspective. In this work, we discuss some outcomes of the MANGO project, showing the results of the execution of real applications on a emulated deeply heterogeneous systems for HPC. Moreover, we assessed the achievements of a proposed resource allocation policy, aiming at identifying a priori the best resource allocation options for a starting application.

[1]  Rainer Leupers,et al.  2PARMA: Parallel Paradigms and Run-Time Management Techniques for Many-Core Architectures , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[2]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[3]  William Fornaciari,et al.  Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators , 2015, ACM J. Emerg. Technol. Comput. Syst..

[4]  Emmanuel Jeannot,et al.  Topology-aware resource management for HPC applications , 2017, ICDCN.

[5]  Giovanni Agosta,et al.  Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators , 2017, 2017 46th International Conference on Parallel Processing Workshops (ICPPW).

[6]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[7]  Ruixuan Li,et al.  HeteroYARN: A Heterogeneous FPGA-Accelerated Architecture Based on YARN , 2020, IEEE Transactions on Parallel and Distributed Systems.

[8]  Nikil D. Dutt,et al.  SPARTA: Runtime task allocation for energy efficient heterogeneous manycores , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  William Fornaciari,et al.  All-Digital Energy-Constrained Controller for General-Purpose Accelerators and CPUs , 2020, IEEE Embedded Systems Letters.

[10]  Luciano Lavagno,et al.  Energy-Efficient Heterogeneous Computing at exaSCALE—ECOSCALE , 2019 .

[11]  William Fornaciari,et al.  CUTBUF: Buffer Management and Router Design for Traffic Mixing in VNET-Based NoCs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[12]  Martin C. Herbordt,et al.  Achieving High Performance with FPGA-Based Computing , 2007, Computer.

[13]  Martin Schulz,et al.  Practical Resource Management in Power-Constrained, High Performance Computing , 2015, HPDC.

[14]  Alessandro Cilardo,et al.  PowerTap: All-digital power meter modeling for run-time power monitoring , 2018, Microprocess. Microsystems.

[15]  Giovanni Agosta,et al.  Managing Heterogeneous Resources in HPC Systems , 2018, PARMA-DITAM '18.

[16]  Radford M. Neal,et al.  Near Shannon limit performance of low density parity check codes , 1996 .

[17]  Chrysostomos Nicopoulos,et al.  BlackOut: Enabling fine-grained power gating of buffers in Network-on-Chip routers , 2017, J. Parallel Distributed Comput..

[18]  Roger F. Woods,et al.  Runtime support for adaptive power capping on heterogeneous SoCs , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[19]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[20]  Simone Libutti,et al.  Co-scheduling tasks on multi-core heterogeneous systems: An energy-aware perspective , 2016, IET Comput. Digit. Tech..

[21]  Edoardo Fusella,et al.  Exploring manycore architectures for next-generation HPC systems through the MANGO approach , 2018, Microprocess. Microsystems.

[22]  C. Ykman-Couvreur,et al.  Parallel paradigms and run-time management techniques for many-core architectures: The 2PARMA approach , 2011, 2011 9th IEEE International Conference on Industrial Informatics.

[23]  Cristinel Ababei,et al.  A Survey of Prediction and Classification Techniques in Multicore Processor Systems , 2019, IEEE Transactions on Parallel and Distributed Systems.

[24]  Giuseppe Massari,et al.  Effective Runtime Resource Management Using Linux Control Groups with the BarbequeRTRM Framework , 2015, TECS.

[25]  Cristian Galleguillos,et al.  Heterogeneity-Aware Resource Allocation in HPC Systems , 2018, ISC.

[26]  Giovanni Agosta,et al.  libVersioningCompiler: An easy-to-use library for dynamic generation and invocation of multiple code versions , 2018, SoftwareX.

[27]  Alessandro Cilardo,et al.  Enabling HPC for QoS-sensitive applications: The MANGO approach , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Alessandro Cilardo,et al.  Reliable power and time-constraints-aware predictive management of heterogeneous exascale systems , 2018, SAMOS.

[29]  Edoardo Fusella,et al.  Deeply Heterogeneous Many-Accelerator Infrastructure for HPC Architecture Exploration , 2017, PARCO.

[30]  Luca Benini,et al.  The ANTAREX tool flow for monitoring and autotuning energy efficient HPC systems , 2017, 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[31]  Vittorio Zaccaria,et al.  Combining application adaptivity and system-wide Resource Management on multi-core platforms , 2014, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV).

[32]  Alessandro Cilardo,et al.  MANGO: Exploring Manycore Architectures for Next-GeneratiOn HPC Systems , 2017, 2017 Euromicro Conference on Digital System Design (DSD).

[33]  Wolfgang Ziegler,et al.  Implementing a “one-stop-shop” providing SMEs with integrated HPC simulation resources using Fortissimo resources , 2014, eChallenges e-2014 Conference Proceedings.

[34]  Anthony A. Maciejewski,et al.  Resilience-Aware Resource Management for Exascale Computing Systems , 2018, IEEE Transactions on Sustainable Computing.