CHiLL : A Framework for Composing High-Level Loop Transformations
暂无分享,去创建一个
[1] W. Pugh,et al. A framework for unifying reordering transformations , 1993 .
[2] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[3] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[4] Jingling Xue. Automating Non-Unimodular Loop Transformations for Massive Parallelism , 1994, Parallel Comput..
[5] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[6] Loop Transformation Using Nonunimodular Matrices , 1995, IEEE Trans. Parallel Distributed Syst..
[7] William Pugh,et al. The Omega Library interface guide , 1995 .
[8] William Pugh,et al. Optimization within a unified transformation framework , 1996 .
[9] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[10] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[11] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[12] Monica S. Lam,et al. Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..
[13] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[14] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[15] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[16] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[17] Marta Jiménez,et al. Register tiling in nonrectangular iteration spaces , 2002, TOPL.
[18] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[19] Ken Kennedy,et al. Transforming Complex Loop Nests for Locality , 2004, The Journal of Supercomputing.
[20] A data locality optimizing algorithm , 2004, SIGP.
[21] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[22] J. Ramanujam,et al. Beyond unimodular transformations , 1995, The Journal of Supercomputing.
[23] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[24] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[25] Keshav Pingali,et al. Think globally, search locally , 2005, ICS '05.
[26] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[27] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[28] William Jalby,et al. Loop Optimization using Hierarchical Compilation and Kernel Decomposition , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[29] Albert Cohen,et al. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[30] Keshav Pingali,et al. A singular loop transformation framework based on non-singular matrices , 1992, International Journal of Parallel Programming.