Sparse Matrix Code Dependence Analysis Simplification at Compile Time

Analyzing array-based computations to determine data dependences is useful for many applications including automatic parallelization, race detection, computation and communication overlap, verification, and shape analysis. For sparse matrix codes, array data dependence analysis is made more difficult by the use of index arrays that make it possible to store only the nonzero entries of the matrix (e.g., in A[B[i]], B is an index array). Here, dependence analysis is often stymied by such indirect array accesses due to the values of the index array not being available at compile time. Consequently, many dependences cannot be proven unsatisfiable or determined until runtime. Nonetheless, index arrays in sparse matrix codes often have properties such as monotonicity of index array elements that can be exploited to reduce the amount of runtime analysis needed. In this paper, we contribute a formulation of array data dependence analysis that includes encoding index array properties as universally quantified constraints. This makes it possible to leverage existing SMT solvers to determine whether such dependences are unsatisfiable and significantly reduces the number of dependences that require runtime analysis in a set of eight sparse matrix kernels. Another contribution is an algorithm for simplifying the remaining satisfiable data dependences by discovering equalities and/or subset relationships. These simplifications are essential to make a runtime-inspection-based approach feasible.

[1]  Anoop Gupta,et al.  Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck , 1990, Parallel Comput..

[2]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[3]  Shoaib Kamil,et al.  Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Pradeep Dubey,et al.  Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver , 2014, ISC.

[5]  Alex Pothen,et al.  A Mapping Algorithm for Parallel Sparse Cholesky Factorization , 1993, SIAM J. Sci. Comput..

[6]  Rudolf Eigenmann,et al.  Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.

[7]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[8]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[9]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[10]  Sungdo Moon,et al.  Evaluation of predicated array data-flow analysis for automatic parallelization , 1999, PPoPP '99.

[11]  William Pugh,et al.  Iteration space slicing and its application to communication optimization , 1997, ICS '97.

[12]  Andrew Lumsdaine,et al.  Sparselib++ v. 1.5 Sparse Matrix Class Library Reference Guide | NIST , 1996 .

[13]  David I. August,et al.  Automatically exploiting cross-invocation parallelism using runtime information , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[14]  Robert E. Shostak,et al.  A Practical Decision Procedure for Arithmetic with Function Symbols , 1979, JACM.

[15]  Xiaotong Zhuang,et al.  Exploiting Parallelism with Dependence-Aware Scheduling , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[16]  Leonardo Mendonça de Moura,et al.  Complete Instantiation for Quantified Formulas in Satisfiabiliby Modulo Theories , 2009, CAV.

[17]  David A. Padua,et al.  Compiler analysis of irregular memory accesses , 2000, PLDI '00.

[18]  Olaf Schenk,et al.  Two-level dynamic scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems , 2002, Parallel Comput..

[19]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[20]  Nancy M. Amato,et al.  A scalable method for run-time loop parallelization , 1995, International Journal of Parallel Programming.

[21]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[22]  Hongbo Rong,et al.  Automating Wavefront Parallelization for Sparse Matrix Computations , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Alan George,et al.  Communication results for parallel sparse Cholesky factorization on a hypercube , 1989, Parallel Comput..

[24]  Martin Schulz,et al.  ARCHER: Effectively Spotting Data Races in Large OpenMP Applications , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  J. Ramanujam,et al.  Distributed memory code generation for mixed Irregular/Regular computations , 2015, PPoPP.

[26]  Henny B. Sipma,et al.  What's Decidable About Arrays? , 2006, VMCAI.

[27]  John R. Gilbert,et al.  Highly Parallel Sparse Cholesky Factorization , 1992, SIAM J. Sci. Comput..

[28]  Joel H. Saltz,et al.  Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.

[29]  Andreas Zeller,et al.  Generalized Task Parallelism , 2015, ACM Trans. Archit. Code Optim..

[30]  Jennifer A. Scott,et al.  Design of a Multicore Sparse Cholesky Factorization Using DAGs , 2010, SIAM J. Sci. Comput..

[31]  Patrick Cousot,et al.  A parametric segmentation functor for fully automatic and scalable array content analysis , 2011, POPL '11.

[32]  Larry Carter,et al.  An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..

[33]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.

[34]  Katherine Yelick,et al.  Autotuning Sparse Matrix-Vector Multiplication for Multicore , 2012 .

[35]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[36]  Joel H. Saltz,et al.  Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..

[37]  Thomas A. Henzinger,et al.  Invariant and Type Inference for Matrices , 2010, VMCAI.

[38]  Larry Carter,et al.  Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..

[39]  Vipin Kumar,et al.  A high performance sparse Cholesky factorization algorithm for scalable parallel computers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[40]  J. Gilbert Predicting Structure in Sparse Matrix Computations , 1994 .

[41]  Nancy M. Amato,et al.  Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.

[42]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[43]  Nicolas Halbwachs,et al.  Discovering properties about arrays in simple programs , 2008, PLDI '08.

[44]  William Pugh,et al.  Nonlinear array dependence analysis , 1994 .