A language extension set to generate adaptive versions automatically

A large part of the development effort of compute-intensive applications is devoted to optimization, i.e. , achieving the computation within a finite budget of time, space or energy. Given the complexity of modern architectures, writing simulation applications is often a two-step workflow. Firstly, developers design a sequential program for algorithmic tuning and debugging purposes. Secondly, experts optimize and exploit possible approximations of the original program to scale to the actual problem size. This second step is a tedious, time-consuming and error-prone task. In this paper we investigate language extensions and compiler tools to achieve that task semi-automatically in the context of approximate computing. We identified the semantic and syntactic information necessary for a compiler to automatically handle approximation and adaptive techniques for a particular class of programs. We propose a set of language extensions generic enough to provide the compiler with the useful semantic information when approximation is beneficial. We implemented the compiler infrastructure to exploit these extensions and to automatically generate the adaptively approximated version of a program. We provide an experimental study of the impact and expressiveness of our language extension set on various applications.

[1]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[2]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Cédric Bastoul,et al.  Adaptive Code Refinement: A Compiler Technique and Extensions to Generate Self-Tuning Applications , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[4]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[5]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[6]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[7]  Kaushik Roy,et al.  Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[8]  Daniel M. Roy,et al.  Probabilistically Accurate Program Transformations , 2011, SAS.

[9]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[11]  Cédric Bastoul Mapping deviation: a technique to adapt or to guard loop transformation intuitions for legality , 2016, CC.

[12]  Carsten Burstedde,et al.  p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees , 2011, SIAM J. Sci. Comput..

[13]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[14]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[15]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[16]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[17]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[18]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[19]  P. Colella,et al.  Local adaptive mesh refinement for shock hydrodynamics , 1989 .

[20]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[21]  L. Törnqvist,et al.  How Should Relative Changes be Measured , 1985 .

[22]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[23]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[24]  Idit Keidar,et al.  GPUfs: integrating a file system with GPUs , 2014, ASPLOS '13.

[25]  Anand Raghunathan,et al.  Best-effort parallel execution framework for Recognition and mining applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[26]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[27]  G. Strang Introduction to Linear Algebra , 1993 .

[28]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[29]  Kathryn S. McKinley,et al.  Uncertain: a first-order type for uncertain data , 2014, ASPLOS.

[30]  Christian Lengauer,et al.  Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..

[31]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[32]  J. Stam Real-Time Fluid Dynamics for Games , 2003 .

[33]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[34]  Steven G. Johnson,et al.  Meep: A flexible free-software package for electromagnetic simulations by the FDTD method , 2010, Comput. Phys. Commun..

[35]  Mark D. Corner,et al.  Eon: a language and runtime system for perpetual systems , 2007, SenSys '07.

[36]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[37]  R. Gosper Exploiting regularities in large cellular spaces , 1984 .

[38]  Milos D. Ercegovac,et al.  The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[39]  Marek Palkowski,et al.  Tiling arbitrarily nested loops by means of the transitive , 2016, Int. J. Appl. Math. Comput. Sci..

[40]  Benjamin S. Kirk,et al.  Library for Parallel Adaptive Mesh Refinement / Coarsening Simulations , 2006 .

[41]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[42]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[43]  Huawei Li,et al.  SoftPCM: Enhancing Energy Efficiency and Lifetime of Phase Change Memory in Video Applications via Approximate Write , 2012, 2012 IEEE 21st Asian Test Symposium.

[44]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[45]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[46]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[47]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[48]  Surendra Byna,et al.  Best-effort semantic document search on GPUs , 2010, GPGPU-3.

[49]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[50]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[51]  Gu-Yeon Wei,et al.  HELIX-UP: Relaxing program semantics to unleash parallelization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[52]  R. Feynman The Feynman lectures on physics : mainly mechanics, radiation, and heat / by Richard P. Feynman, Robert B. Leighton, Matthew Sands , 1963 .

[53]  Luca Benini,et al.  Spatial Memoization: Concurrent Instruction Reuse to Correct Timing Errors in SIMD Architectures , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.

[54]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[55]  Sven Verdoolaege,et al.  Polyhedral Extraction Tool , 2012 .