GLORE: generalized loop redundancy elimination upon LER-notation

This paper presents GLORE, a novel approach to enabling the detection and removal of large-scoped redundant computations in nested loops. GLORE works on LER-notation, a new representation of computations in both regular and irregular loops. Together with a set of novel algorithms, it makes GLORE able to systematically consider computation reordering at both the expression level and the loop level in a unified manner. GLORE shows an applicability much broader than prior methods have, and frequently lowers the computational complexities of some nested loops that are elusive to prior optimization techniques, producing significantly larger speedups.

[1]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Guy Godin,et al.  Acceleration of Binning Nearest Neighbour Methods , 2000 .

[3]  Jonathan Drake,et al.  Accelerated k-means with adaptive distance bounds , 2012 .

[4]  David A. Ham,et al.  An Algorithm for the Optimization of Finite Element Integration Loops , 2016, ACM Trans. Math. Softw..

[5]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[6]  Jing Wang,et al.  Fast approximate k-means via cluster closures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[8]  David E. Bernholdt,et al.  Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations , 2005, International Conference on Computational Science.

[9]  David E. Bernholdt,et al.  Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations , 2006, International Conference on Computational Science.

[10]  Chi-Chung Lam,et al.  On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution , 1997, Parallel Process. Lett..

[11]  Robert Paige,et al.  Finite Differencing of Computable Expressions , 1982, TOPL.

[12]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[13]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[14]  Michael Hicks,et al.  Incremental computation with names , 2015, OOPSLA.

[15]  Gautam Gupta Simplifying reductions , 2006, POPL '06.

[16]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[17]  Xueyi Wang,et al.  A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality , 2011, The 2011 International Joint Conference on Neural Networks.

[18]  Michael Hicks,et al.  Adapton: composable, demand-driven incremental computation , 2014, PLDI.

[19]  Steven J. Deitz,et al.  Eliminating redundancies in sum-of-product array computations , 2001, ICS '01.

[20]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[21]  Ken Kennedy,et al.  Redundancy elimination revisited , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Xipeng Shen,et al.  TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems , 2015, Proc. VLDB Endow..

[23]  Xipeng Shen,et al.  Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction , 2017, PLDI.

[24]  Isil Dillig,et al.  Static detection of asymptotic performance bugs in collection traversals , 2015, PLDI.

[25]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[26]  David Joyner,et al.  Open source computer algebra systems: SymPy , 2012, ACCA.

[27]  Ronald J. Gutman,et al.  Reach-Based Routing: A New Approach to Shortest Path Algorithms Optimized for Road Networks , 2004, ALENEX/ANALC.