Efficient Representation Scheme for Multidimensional Array Operations
暂无分享,去创建一个
[1] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[2] Keshav Pingali,et al. A Singular Loop Transformation Framework Based on Non-Singular Matrices , 1992, LCPC.
[3] Emilio L. Zapata,et al. Modeling set associative caches behavior for irregular computations , 1998, SIGMETRICS '98/PERFORMANCE '98.
[4] Keshav Pingali,et al. Compiling Parallel Sparse Code for User-Defined Data Structures , 1997, PPSC.
[5] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[6] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[7] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[8] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[9] J. Cullum,et al. Lanczos algorithms for large symmetric eigenvalue computations , 1985 .
[10] Mithuna Thottethodi,et al. Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[11] Yeh-Ching Chung,et al. Efficient parallel algorithms for multi-dimensional matrix operations , 2000, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN 2000.
[12] Bharat Kumar,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1995 .
[13] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[14] Boleslaw K. Szymanski,et al. Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines , 1994, PARLE.
[15] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[16] Joel H. Saltz,et al. Parallelization Techniques for Sparse Matrix Applications , 1996, J. Parallel Distributed Comput..
[17] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[18] Ioana Banicescu,et al. Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations , 1995, SC.
[19] Michael F. P. O'Boyle,et al. Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[20] K. Pingali,et al. Compiling Parallel Code for Sparse Matrix Applications , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[21] Emilio L. Zapata,et al. Cache Misses Prediction for High Performance Sparse Algorithms , 1998, Euro-Par.
[22] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[23] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[24] Keshav Pingali,et al. A Relational Approach to the Compilation of Sparse Matrix Programs , 1997, Euro-Par.
[25] Kanad Ghose,et al. Caching-efficient multithreaded fast multiplication of sparse matrices , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[26] M. P. Levin,et al. Numerical Recipes In Fortran 90: The Art Of Parallel Scientific Computing , 1998, IEEE Concurrency.
[27] Emilio L. Zapata,et al. Automatic analytical modeling for the estimation of cache misses , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[28] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.
[29] Emilio L. Zapata,et al. Cache probabilistic modeling for basic sparse algebra kernels involving matrices with a non-uniform distribution , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).
[30] C.W. Kessler,et al. The SPARAMAT approach to automatic comprehension of sparse matrix computations , 1999, Proceedings Seventh International Workshop on Program Comprehension.
[31] Mahmut T. Kandemir,et al. Improving Cache Locality by a Combination of Loop and Data Transformation , 1999, IEEE Trans. Computers.
[32] Michael F. P. O'Boyle,et al. Integrating Loop and Data Transformations for Global Optimization , 2002, J. Parallel Distributed Comput..
[33] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, SIGP.
[34] Jerrold L. Wagener,et al. Fortran 90 Handbook: Complete Ansi/Iso Reference , 1992 .