Loop and data transformations for sparse matrix code
暂无分享,去创建一个
[1] Padma Raghavan,et al. Exploiting dense substructures for fast sparse matrix vector multiplication , 2011, Int. J. High Perform. Comput. Appl..
[2] Albert Cohen,et al. Polyhedral Code Generation in the Real World , 2006, CC.
[3] Aart J. C. Bik,et al. Compiler support for sparse matrix computations , 1996 .
[4] Aart J. C. Bik,et al. Automatic Data Structure Selection and Transformation for Sparse Matrix Computations , 1996, IEEE Trans. Parallel Distributed Syst..
[5] Ken Kennedy,et al. Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.
[6] Paul Feautrier,et al. Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.
[7] Joel H. Saltz,et al. Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..
[8] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[9] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[10] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.
[11] William Pugh,et al. Optimization within a unified transformation framework , 1996 .
[12] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.
[13] P. Sadayappan,et al. Stencil-Aware GPU Optimization of Iterative Solvers , 2013, SIAM J. Sci. Comput..
[14] Keshav Pingali,et al. Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.
[15] Chun Chen,et al. Polyhedra scanning revisited , 2012, PLDI.
[16] Mary W. Hall,et al. Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.
[17] Lawrence Rauchwerger,et al. A Hybrid Approach to Proving Memory Reference Monotonicity , 2011, LCPC.
[18] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[19] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.
[20] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[21] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[22] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[23] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.
[24] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[25] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[26] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[27] Larry Carter,et al. An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..
[28] Joel H. Saltz,et al. Principles of runtime support for parallel processors , 1988, ICS '88.
[29] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[30] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[31] Harry A. G. Wijshoff,et al. Sublimation: Expanding Data Structures to Enable Data Instance Specific Optimizations , 2010, LCPC.
[32] Keshav Pingali,et al. Sparse code generation for imperfectly nested loops with dependences , 1997, ICS '97.
[33] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[34] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.
[35] William Pugh,et al. SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations , 1998, LCPC.
[36] J. Ramanujam,et al. Code generation for parallel execution of a class of irregular loops on distributed memory systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Rudolf Eigenmann,et al. Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.
[38] John R. Gilbert,et al. Highly Parallel Sparse Matrix-Matrix Multiplication , 2010, ArXiv.
[39] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[40] Calvin J. Ribbens,et al. Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.
[41] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[42] David A. Padua,et al. Compiler analysis of irregular memory accesses , 2000, PLDI '00.
[43] A. J. C. Bik,et al. Advanced compiler optimizations for sparse computations , 1993, Supercomputing '93.
[44] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[45] John M. Mellor-Crummey,et al. Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..
[46] Johannes Hölzl,et al. Specifying and verifying sparse matrix codes , 2010, ICFP '10.
[47] Aart J. C. Bik,et al. On Automatic Data Structure Selection and Code Generation for Sparse Computations , 1993, LCPC.
[48] Keshav Pingali,et al. A Relational Approach to the Compilation of Sparse Matrix Programs , 1997, Euro-Par.