Application instrumentation for performance analysis and tuning with focus on energy efficiency

[1]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[2]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[3]  Christiaan J. J. Paredis,et al.  The Role and Limitations of Modeling and Simulation in Systems Design , 2004 .

[4]  ZhaoQin,et al.  Transparent dynamic instrumentation , 2012 .

[5]  Luca Benini,et al.  DiG: enabling out-of-band scalable high-resolution monitoring for data-center analytics, automation and control (extended) , 2018, Cluster Computing.

[6]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[7]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[8]  Jeffrey S. Vetter,et al.  Automated Characterization of Parallel Application Communication Patterns , 2015, HPDC.

[9]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[10]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[11]  Andres Charif Rubial,et al.  Performance Tuning of x86 OpenMP Codes with MAQAO , 2009, Parallel Tools Workshop.

[12]  Cédric Valensi A generic approach to the definition of low-level components for multi-architecture binary analysis , 2014 .

[13]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[14]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[15]  Venkatesh Kannan,et al.  Evaluation of the HPC applications dynamic behavior in terms of energy consumption , 2017 .

[16]  Lubomir Riha,et al.  Overview of Application Instrumentation for Performance Analysis and Tuning , 2019, PPAM.

[17]  Ondrej Meca,et al.  A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support , 2018, Int. J. High Perform. Comput. Appl..

[18]  Frank Mueller,et al.  Uncore power scavenger: a runtime for uncore power conservation on HPC systems , 2019, SC.

[19]  Eduardo Cesar Galobardes,et al.  Automatic Tuning of HPC Applications. The Periscope Tuning Framework , 2015 .

[20]  Jesús Labarta,et al.  Framework for a productive performance optimization , 2013, Parallel Comput..

[21]  Barton P. Miller,et al.  Anywhere, any-time binary instrumentation , 2011, PASTE '11.

[22]  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI.

[23]  Qin Zhao,et al.  Transparent dynamic instrumentation , 2012, VEE '12.

[24]  Venkatesh Kannan,et al.  The READEX formalism for automatic tuning for energy efficiency , 2016, Computing.

[25]  Wolfgang E. Nagel,et al.  Run-Time Exploitation of Application Dynamism for Energy-Efficient Exascale Computing (READEX) , 2015, 2015 IEEE 18th International Conference on Computational Science and Engineering.

[26]  Martin Schulz,et al.  A Run-Time System for Power-Constrained HPC Applications , 2015, ISC.

[27]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[28]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[29]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[30]  Wolfgang E. Nagel,et al.  HDEEM: High Definition Energy Efficiency Monitoring , 2014, 2014 Energy Efficient Supercomputing Workshop.

[31]  Frank Mueller,et al.  Power tuning HPC jobs on power-constrained systems , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[32]  Jack J. Dongarra,et al.  Investigating power capping toward energy‐efficient scientific applications , 2019, Concurr. Comput. Pract. Exp..

[33]  Luca Benini,et al.  COUNTDOWN - three, two, one, low power! A Run-time Library for Energy Saving in MPI Communication Primitives , 2018, ArXiv.

[34]  Christian Bischof,et al.  Parallel computing : architectures, algorithms and applications , 2008 .

[35]  Fuat Keceli,et al.  Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions , 2017, ISC.

[36]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[37]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).