Power-Efficient Computing: Experiences from the COSA Project

Energy consumption is today one of the most relevant issues in operating HPC systems for scientific applications. The use of unconventional computing systems is therefore of great interest for several scientific communities looking for a better tradeoff between time-to-solution and energy-to-solution. In this context, the performance assessment of processors with a high ratio of performance per watt is necessary to understand how to realize energy-efficient computing systems for scientific applications, using this class of processors. Computing On SOC Architecture (COSA) is a three-year project (2015–2017) funded by the Scientific Commission V of the Italian Institute for Nuclear Physics (INFN), which aims to investigate the performance and the total cost of ownership offered by computing systems based on commodity low-power Systems on Chip (SoCs) and high energy-efficient systems based on GP-GPUs. In this work, we present the results of the project analyzing the performance of several scientific applications on several GPU- and SoC-based systems. We also describe the methodology we have used to measure energy performance and the tools we have implemented to monitor the power drained by applications while running.

[1]  Steve B. Furber,et al.  Power analysis of large-scale, real-time neural networks on SpiNNaker , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[2]  Davide Rossetti,et al.  APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .

[3]  Pier Stanislao Paolucci,et al.  Distributed simulation of polychronous and plastic spiking neural networks: strong and weak scaling of a representative mini-application benchmark executed on a small-scale commodity cluster , 2013, ArXiv.

[4]  L. Biferale,et al.  Reactive Rayleigh-Taylor systems: Front propagation and non-stationarity , 2011 .

[5]  David Horák,et al.  Energy consumption optimization of the Total-FETI solver and BLAS routines by changing the CPU frequency , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[6]  Jack Dongarra,et al.  Power Management and Event Verification in PAPI , 2016, HiPC 2016.

[7]  Raffaele Tripiccione,et al.  Massively parallel lattice-Boltzmann codes on large GPU clusters , 2016, Parallel Comput..

[8]  Raffaele Tripiccione,et al.  Experience on Vectorizing Lattice Boltzmann Kernels for Multi- and Many-Core Architectures , 2015, PPAM.

[9]  Giancarlo Mauri,et al.  Tau Leaping Stochastic Simulation Method in P Systems , 2006, Workshop on Membrane Computing.

[10]  Enrique S. Quintana-Ortí,et al.  Solving dense linear systems with hybrid ARM+GPU platforms , 2015, 2015 Latin American Computing Conference (CLEI).

[11]  Stefan Turek,et al.  The ICARUS White Paper: A Scalable, Energy-Efficient, Solar-Powered HPC Center Based on Low Power GPUs , 2016, Euro-Par Workshops.

[12]  Richard W. Vuduc,et al.  Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[13]  Marc-Oliver Gewaltig,et al.  NEST (NEural Simulation Tool) , 2007, Scholarpedia.

[14]  Raffaele Tripiccione,et al.  Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-Phi Co-Processor , 2013, ICCS.

[15]  Erik Schnetter,et al.  GRHydro: a new open-source general-relativistic magnetohydrodynamics code for the Einstein toolkit , 2013, 1304.5544.

[16]  Raffaele Tripiccione,et al.  Evaluation of DVFS techniques on modern HPC processors and accelerators for energy‐aware applications , 2017, Concurr. Comput. Pract. Exp..

[17]  Raffaele Tripiccione,et al.  An optimized D2Q37 Lattice Boltzmann code on GP-GPUs , 2013 .

[18]  Enrique S. Quintana-Ortí,et al.  Energy balance between voltage-frequency scaling and resilience for linear algebra routines on low-power multicore architectures , 2017, Parallel Comput..

[19]  Raffaele Tripiccione,et al.  An Optimized Lattice Boltzmann Code for BlueGene/Q , 2013, PPAM.

[20]  L. Biferale,et al.  Lattice Boltzmann method with self-consistent thermo-hydrodynamic equilibria , 2009, Journal of Fluid Mechanics.

[21]  R. Brancaccio,et al.  Real-Time Reconstruction for 3-D CT Applied to Large Objects of Cultural Heritage , 2011, IEEE Transactions on Nuclear Science.

[22]  Y. Zhang,et al.  The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems , 2016, 2016 Euromicro Conference on Digital System Design (DSD).

[23]  Dharmendra S. Modha,et al.  Cognitive Computing , 2011, Informatik-Spektrum.

[24]  Eduard Ayguadé,et al.  The Mont-Blanc Prototype: An Alternative Approach for HPC Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Maria Pia Morigi,et al.  X-Ray Computed Tomography Applied to Objects of Cultural Heritage: Porting and Testing the Filtered Back-Projection Reconstruction Algorithm on Low Power Systems-on-Chip , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[26]  Paul M. Carpenter,et al.  EUROSERVER: Share-anything scale-out micro-server design , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27]  C. Ott,et al.  The Einstein Toolkit: a community computational infrastructure for relativistic astrophysics , 2011, 1111.3344.

[28]  The Ligo Scientific Collaboration,et al.  Observation of Gravitational Waves from a Binary Black Hole Merger , 2016, 1602.03837.

[29]  Giancarlo Mauri,et al.  Modelling Spatial Heterogeneity and Macromolecular Crowding with Membrane Systems , 2010, Int. Conf. on Membrane Computing.

[30]  Vladimir V. Stegailov,et al.  Efficiency of the Tegra K1 and X1 systems-on-chip for classical molecular dynamics , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[31]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[32]  Mateo Valero,et al.  Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  F Toschi,et al.  Second-order closure in stratified turbulence: simulations and modeling of bulk and entrainment regions. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  R. Pietri,et al.  Modeling equal and unequal mass binary neutron star mergers using public codes , 2015, 1509.08804.

[35]  Andrea Ferraro,et al.  Evaluating Systems on Chip through HPC Bioinformatic and Astrophysic Applications , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[36]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[37]  Raffaele Tripiccione,et al.  Performance issues on many-core processors: A D2Q37 Lattice Boltzmann scheme as a test-case , 2013 .

[38]  Federico Toschi,et al.  Lattice Boltzmann methods for thermal flows: Continuum limit and applications to compressible Rayleigh-Taylor systems , 2010, 1005.3639.

[39]  S. Miglioranzi,et al.  The LHCb Simulation Application, Gauss: Design, Evolution and Experience , 2011 .

[40]  Raffaele Tripiccione,et al.  Energy-Performance Tradeoffs for HPC Applications on Low Power Processors , 2015, Euro-Par Workshops.

[41]  Steve B. Furber,et al.  Neural Systems Engineering , 2008, Computational Intelligence: A Compendium.

[42]  Linda R Petzold,et al.  Efficient step size selection for the tau-leaping simulation method. , 2006, The Journal of chemical physics.

[43]  Raffaele Tripiccione,et al.  Performance and portability of accelerated lattice Boltzmann applications with OpenACC , 2016, Concurr. Comput. Pract. Exp..

[44]  R. Quagliani The LHCb Detector at the LHC , 2018 .

[45]  Peter Pagel,et al.  Cognitive Computing , 2018, Informatik-Spektrum.

[46]  Rainer Leupers,et al.  Dynamic many-process applications on many-tile embedded systems and HPC clusters: The EURETILE programming environment and execution platforms , 2016, J. Syst. Archit..

[47]  Avinash C. Kak,et al.  Principles of computerized tomographic imaging , 2001, Classics in applied mathematics.