Floating-Point Precision Tuning Using Blame Analysis

While tremendously useful, automated techniques for tuning the precision of floating-point programs face important scalability challenges. We present Blame Analysis, a novel dynamic approach that speeds up precision tuning. Blame Analysis performs floating-point instructions using different levels of accuracy for their operands. The analysis determines the precision of all operands such that a given precision is achieved in the final result of the program. Our evaluation on ten scientific programs shows that Blame Analysis is successful in lowering operand precision. As it executes the program only once, the analysis is particularly useful when targeting reductions in execution time. In such case, the analysis needs to be combined with search-based tools such as Precimonious. Our experiments show that combining Blame Analysis with Precimonious leads to obtaining better results with significant reduction in analysis time: the optimized programs execute faster (in three cases, we observe as high as 39.9% program speedup) and the combined analysis time is 9× faster on average, and up to 38× faster than Precimonious alone.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Chris H. Q. Ding,et al.  Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications , 2000, ICS '00.

[3]  Julien Langou,et al.  Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..

[4]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[5]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[6]  Jack J. Dongarra,et al.  Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy , 2008, TOMS.

[7]  Julio Sanchez,et al.  High-Precision Arithmetic , 2007 .

[8]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[9]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[10]  Maurice Yarrow,et al.  New Implementations and Results for the NAS Parallel Benchmarks 2 , 1997, PPSC.

[11]  Xiaoye S. Li,et al.  Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[12]  Zhendong Su,et al.  Automatic detection of floating-point exceptions , 2013, POPL.

[13]  Pavel Panchekha,et al.  Automatically improving accuracy for floating point expressions , 2015, PLDI.

[14]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[15]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[16]  Bronis R. de Supinski,et al.  Poster: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[17]  Koushik Sen,et al.  Jalangi: a selective record-replay and dynamic analysis framework for JavaScript , 2013, ESEC/FSE 2013.

[18]  Sebastian Hack,et al.  A dynamic program analysis to find floating-point accuracy problems , 2012, PLDI.

[19]  Florent de Dinechin,et al.  Certifying the Floating-Point Implementation of an Elementary Function Using Gappa , 2011, IEEE Transactions on Computers.

[20]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[21]  James Demmel,et al.  Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.

[22]  Xiangyu Zhang,et al.  On-the-fly detection of instability problems in floating-point program execution , 2013, OOPSLA.

[23]  Eric Goubault,et al.  Towards an Industrial Use of FLUCTUAT on Safety-Critical Avionics Software , 2009, FMICS.

[24]  References , 1971 .

[25]  Dan Grossman,et al.  Probability type inference for flexible approximate programming , 2015, OOPSLA.

[26]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[27]  Ganesh Gopalakrishnan,et al.  Efficient search for inputs causing high floating-point errors , 2014, PPoPP '14.

[28]  David H. Bailey,et al.  High-precision floating-point arithmetic in scientific computation , 2004, Computing in Science & Engineering.

[29]  Jack Dongarra,et al.  Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs , 2014 .

[30]  Viktor Kuncak,et al.  Sound compilation of reals , 2013, POPL.

[31]  J. Borwein,et al.  High-Precision Arithmetic : Progress and Challenges , 2013 .

[32]  Ganesh Gopalakrishnan,et al.  Rigorous Estimation of Floating-Point Round-off Errors with Symbolic Taylor Expansions , 2015, FM.

[33]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[34]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[35]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[36]  James Demmel,et al.  Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[37]  G. W. Stewart,et al.  Dynamic floating-point cancellation detection , 2013, Parallel Comput..

[38]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[39]  Zhendong Su,et al.  A Genetic Algorithm for Detecting Significant Floating-Point Inaccuracies , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  Jack J. Dongarra,et al.  Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs , 2014, VECPAR.

[41]  Michael O. Lam,et al.  FPInst : Floating Point Error Analysis Using Dyninst , 2008 .

[42]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[43]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[44]  Jack J. Dongarra,et al.  Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures , 2014, VECPAR.

[45]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[46]  Viktor Kuncak,et al.  Trustworthy numerical computation in Scala , 2011, OOPSLA '11.