Global communication optimization for tensor contraction expressions under memory constraints
暂无分享,去创建一个
J. Ramanujam | Gerald Baumgartner | Chi-Chung Lam | P. Sadayappan | Xiaoyang Gao | Daniel Cociorva | Sandhya Krishnan | J. Ramanujam | P. Sadayappan | Gerald Baumgartner | D. Cociorva | X. Gao | Chi-Chung Lam | S. Krishnan
[1] Leonidas J. Guibas,et al. Compilation and delayed evaluation in APL , 1978, POPL.
[2] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[3] Lawrence Snyder,et al. The implementation and evaluation of fusion and contraction in array languages , 1998, PLDI '98.
[4] Gerald Baumgartner,et al. Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals , 1999, LCPC.
[5] Vivek Sarkar,et al. Optimization of array accesses by collective loop transformations , 1991, ICS '91.
[6] Gustavo E. Scuseria,et al. Achieving Chemical Accuracy with Coupled-Cluster Theory , 1995 .
[7] David E. Bernholdt,et al. Space-time trade-off optimization for a class of electronic structure calculations , 2002, PLDI '02.
[8] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[9] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse , 2000 .
[10] Chi-Chung Lam,et al. Performance optimization of a class of loops implementing multidimensional integrals , 1999 .
[11] David E. Bernholdt,et al. Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization , 2001, HiPC.
[12] Chi-Chung Lam,et al. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution , 1997, Parallel Process. Lett..
[13] Kathryn S. McKinley,et al. A Compiler Optimization Algorithm for Shared-Memory Multiprocessors , 1998, IEEE Trans. Parallel Distributed Syst..
[14] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[15] Cheng Wang,et al. Data locality enhancement by memory reduction , 2001, ICS '01.
[16] P. Kollman,et al. Encyclopedia of computational chemistry , 1998 .
[17] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[18] Alain Darte. On the Complexity of Loop Fusion , 2000, Parallel Comput..
[19] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.
[20] Kathryn S. McKinley,et al. A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality , 1997, Comput. J..
[21] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.