Sparse computation data dependence simplification for efficient compiler-generated inspectors
暂无分享,去创建一个
Tomofumi Yuki | Kazem Cheshmi | Maryam Mehri Dehnavi | Mary W. Hall | Catherine Mills Olschanowsky | Anand Venkat | Michelle Mills Strout | Mahdi Soltan Mohammadi | Eddie C. Davis | Payal Nandy
[1] Sivan Toledo,et al. Elimination Structures in Scientific Computing , 2004, Handbook of Data Structures and Applications.
[2] William Pugh,et al. Iteration space slicing and its application to communication optimization , 1997, ICS '97.
[3] Anoop Gupta,et al. Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck , 1990, Parallel Comput..
[4] Shoaib Kamil,et al. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Pradeep Dubey,et al. Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] A. Peirce. Computer Methods in Applied Mechanics and Engineering , 2010 .
[7] Mary W. Hall,et al. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code , 2018, Proceedings of the IEEE.
[8] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[9] Pradeep Dubey,et al. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver , 2014, ISC.
[10] Andrew Lumsdaine,et al. Sparselib++ v. 1.5 Sparse Matrix Class Library Reference Guide | NIST , 1996 .
[11] John R. Gilbert,et al. Predicting fill for sparse orthogonal factorization , 1986, JACM.
[12] Timothy A. Davis,et al. Accelerating sparse cholesky factorization on GPUs , 2014, IA3 '14.
[13] Henny B. Sipma,et al. What's Decidable About Arrays? , 2006, VMCAI.
[14] Thomas Brandes. The importance of direct dependences for automatic parallelization , 1988, ICS '88.
[15] Alex Pothen,et al. A Mapping Algorithm for Parallel Sparse Cholesky Factorization , 1993, SIAM J. Sci. Comput..
[16] Rudolf Eigenmann,et al. Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.
[17] Martin Schulz,et al. ARCHER: Effectively Spotting Data Races in Large OpenMP Applications , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[18] Lawrence Rauchwerger,et al. Hybrid Analysis: Static & Dynamic Memory Reference Analysis , 2004, International Journal of Parallel Programming.
[19] David R. O'Hallaron,et al. Languages, Compilers and Run-Time Systems for Scalable Computers , 1998, Springer US.
[20] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[21] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[22] Larry Carter,et al. An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..
[23] D. Kershaw. The incomplete Cholesky—conjugate gradient method for the iterative solution of systems of linear equations , 1978 .
[24] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[25] Cesare Tinelli,et al. Finding conflicting instances of quantified formulas in SMT , 2014, 2014 Formal Methods in Computer-Aided Design (FMCAD).
[26] Tomofumi Yuki,et al. Extending Index-Array Properties for Data Dependence Analysis , 2018, LCPC.
[27] David I. August,et al. Automatically exploiting cross-invocation parallelism using runtime information , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[28] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[29] Ran Zheng,et al. GPU-based multifrontal optimizing method in sparse Cholesky factorization , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[30] Hongbo Rong,et al. Automating Wavefront Parallelization for Sparse Matrix Computations , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[31] J. Gilbert. Predicting Structure in Sparse Matrix Computations , 1994 .
[32] Katherine Yelick,et al. Automatic Performance Tuning and Analysis of Sparse Triangular Solve , 2002 .
[33] Nancy M. Amato,et al. Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.
[34] Nikolaj Bjørner,et al. Efficient E-Matching for SMT Solvers , 2007, CADE.
[35] Shoaib Kamil,et al. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Paul Feautrier,et al. Fuzzy array dataflow analysis , 1995, PPOPP '95.
[37] J. Ramanujam,et al. Distributed memory code generation for mixed Irregular/Regular computations , 2015, PPoPP.
[38] Andreas Zeller,et al. Generalized Task Parallelism , 2015, ACM Trans. Archit. Code Optim..
[39] Christof Löding,et al. Foundations for natural proofs and quantifier instantiation , 2017, Proc. ACM Program. Lang..
[40] Chun Chen,et al. Polyhedra scanning revisited , 2012, PLDI.
[41] Katherine Yelick,et al. Autotuning Sparse Matrix-Vector Multiplication for Multicore , 2012 .
[42] Yunheung Paek,et al. Efficient and precise array access analysis , 2002, TOPL.
[43] William Pugh,et al. Constraint-based array dependence analysis , 1998, TOPL.
[44] Daniel Kroening,et al. Decision Procedures - An Algorithmic Point of View , 2008, Texts in Theoretical Computer Science. An EATCS Series.
[45] Viktor Kuncak,et al. On Counterexample Guided Quantifier Instantiation for Synthesis in CVC4 , 2015, ArXiv.
[46] Larry Carter,et al. Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..
[47] Vipin Kumar,et al. A high performance sparse Cholesky factorization algorithm for scalable parallel computers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.
[48] Steven Derrien,et al. Runtime dependency analysis for loop pipelining in High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[49] Joel H. Saltz,et al. Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..
[50] E. Ng,et al. Predicting structure in nonsymmetric sparse matrix factorizations , 1993 .
[51] Lawrence Rauchwerger,et al. Logical inference techniques for loop parallelization , 2012, PLDI.
[52] William Pugh,et al. Nonlinear array dependence analysis , 1994 .
[53] John R. Gilbert,et al. Highly Parallel Sparse Cholesky Factorization , 1992, SIAM J. Sci. Comput..
[54] Jennifer A. Scott,et al. Design of a Multicore Sparse Cholesky Factorization Using DAGs , 2010, SIAM J. Sci. Comput..
[55] Xiaotong Zhuang,et al. Exploiting Parallelism with Dependence-Aware Scheduling , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[56] Leonardo Mendonça de Moura,et al. Complete Instantiation for Quantified Formulas in Satisfiabiliby Modulo Theories , 2009, CAV.
[57] Nancy M. Amato,et al. A scalable method for run-time loop parallelization , 1995, International Journal of Parallel Programming.
[58] Ulrich Rüde,et al. Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .
[59] Robert E. Shostak,et al. A Practical Decision Procedure for Arithmetic with Function Symbols , 1979, JACM.
[60] M. Papadrakakis,et al. Accuracy and effectiveness of preconditioned conjugate gradient algorithms for large and ill-conditioned problems , 1993 .
[61] William Pugh,et al. The Omega Library interface guide , 1995 .
[62] Michele Benzi,et al. Robust Approximate Inverse Preconditioning for the Conjugate Gradient Method , 2000, SIAM J. Sci. Comput..
[63] David A. Padua,et al. Compiler analysis of irregular memory accesses , 2000, PLDI '00.
[64] Olaf Schenk,et al. Two-level dynamic scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems , 2002, Parallel Comput..
[65] Pascal Hénon,et al. PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..
[66] Alan George,et al. Communication results for parallel sparse Cholesky factorization on a hypercube , 1989, Parallel Comput..