HPAC: evaluating approximate computing techniques on HPC OpenMP applications

As we approach the limits of Moore's law, researchers are exploring new paradigms for future high-performance computing (HPC) systems. Approximate computing has gained traction by promising to deliver substantial computing power. However, due to the stringent accuracy requirements of HPC scientific applications, the broad adoption of approximate computing methods in HPC requires an in-depth understanding of the application's amenability to approximations. We develop HPAC, a framework with compiler and runtime support for code annotation and transformation, and accuracy vs. performance trade-off analysis of OpenMP HPC applications. We use HPAC to perform an in-depth analysis of the effectiveness of approximate computing techniques when applied to HPC applications. The results reveal possible performance gains of approximation and its interplay with parallel execution. For instance, in the LULESH proxy application approximation provides substantial performance gains due to the reduction of memory accesses. However, in the leukocyte benchmark approximation induces load imbalance in the parallel execution and thus limiting the performance gains.

[1]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[2]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[3]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[4]  Glenn Reinman,et al.  BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[5]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[6]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Markus Schordan,et al.  ADAPT: Algorithmic Differentiation Applied to Floating-Point Precision Tuning , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[9]  Vikram S. Adve,et al.  ApproxHPVM: a portable compiler IR for accuracy-aware optimizations , 2019, Proc. ACM Program. Lang..

[10]  Luca Benini,et al.  Variation-tolerant OpenMP tasking on tightly-coupled processor clusters , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Zeyuan Allen Zhu,et al.  Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.

[12]  Dimitrios S. Nikolopoulos,et al.  Exploiting Significance of Computations for Energy-Constrained Approximate Computing , 2016, International Journal of Parallel Programming.

[13]  James Demmel,et al.  Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  David E. Keyes,et al.  Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[15]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[16]  Dimitrios S. Nikolopoulos,et al.  A significance-driven programming framework for energy-constrained approximate computing , 2015, Conf. Computing Frontiers.

[17]  Sarita V. Adve,et al.  ApproxTuner: a compiler and runtime system for adaptive approximations , 2021, PPoPP.

[18]  Anand Raghunathan,et al.  Best-effort parallel execution framework for Recognition and mining applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[19]  Nikolaos Hardavellas,et al.  Temporal Approximate Function Memoization , 2018, IEEE Micro.

[20]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[21]  Asit K. Mishra,et al.  iACT: A Software-Hardware Framework for Understanding the Scope of Approximate Computing , 2014 .

[22]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[23]  Rudolf Eigenmann,et al.  HiPA: history-based piecewise approximation for functions , 2017, ICS.

[24]  Rolf Drechsler,et al.  Towards Reversed Approximate Hardware Design , 2018, 2018 21st Euromicro Conference on Digital System Design (DSD).

[25]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[26]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[27]  D. Funaro Polynomial Approximation of Differential Equations , 1992 .

[28]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[29]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[30]  Saurabh Bagchi,et al.  GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications , 2019, ISC.

[31]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[32]  Semeen Rehman,et al.  Architectural-space exploration of approximate multipliers , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[33]  George Bosilca,et al.  PaRSEC : A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .

[34]  Surendra Byna,et al.  Exploiting the forgiving nature of applications for scalable parallel execution , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[35]  Markus Schordan,et al.  Tool Integration for Source-Level Mixed Precision , 2019, 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness).

[36]  Dimitrios S. Nikolopoulos,et al.  A programming model and runtime system for significance-aware energy-efficient computing , 2015, PPOPP.

[37]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[38]  Henry Hoffmann,et al.  Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[39]  Spyros Lalis,et al.  Significance-Aware Program Execution on Unreliable Hardware , 2017, ACM Trans. Archit. Code Optim..

[40]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[42]  Daniel M. Roy,et al.  Probabilistically Accurate Program Transformations , 2011, SAS.

[43]  Luca Benini,et al.  A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[44]  Jack Dongarra,et al.  Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC , 2022, IEEE Transactions on Parallel and Distributed Systems.

[45]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[46]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).