Tiling optimizations for stencil computations
暂无分享,去创建一个
[1] Rudolf Eigenmann,et al. Experiences in Using Cetus for Source-to-Source Transformations , 2004, LCPC.
[2] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[3] David A. Padua,et al. Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.
[4] Benoît Meister,et al. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.
[5] Ian T. Foster,et al. Cactus Application: Performance Predictions in Grid Environments , 2001, Euro-Par.
[6] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[7] Keshav Pingali,et al. An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.
[8] Bowen Alpern,et al. Hierarchical Tiling: A Methodology for High Performance , 1996 .
[9] Olgierd Wojtasiewicz,et al. Elements of mathematical logic , 1964 .
[10] Xing Zhou,et al. BulkCompactor: Optimized deterministic execution via Conflict-Aware commit of atomic blocks , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[11] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[12] James Demmel,et al. Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[13] William Pugh,et al. The Omega Library interface guide , 1995 .
[14] Xing Zhou,et al. Scheduling of stream-based real-time applications for heterogeneous systems , 2011, LCTES '11.
[15] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[16] David G. Wonnacott,et al. Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.
[17] Mahmut T. Kandemir,et al. On-chip cache hierarchy-aware tile scheduling for multicore machines , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[18] Mahmut T. Kandemir,et al. Memory system optimization of embedded software , 2003, Proc. IEEE.
[19] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[20] Sriram Krishnamoorthy,et al. Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.
[21] Jingling Xue,et al. Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.
[22] J. Ramanujam,et al. Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[23] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[24] Wenguang Chen,et al. Cache Sharing Management for Performance Fairness in Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[25] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[26] Larry Carter,et al. Selecting tile shape for minimal execution time , 1999, SPAA '99.
[27] Basilio B. Fraguela,et al. The Hierarchically Tiled Arrays programming approach , 2004, LCR.
[28] Ganesh Bikshandi,et al. Parallel Programming With Hierarchically Tiled Arrays , 2007 .
[29] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[30] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[31] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[32] Sanjay V. Rajopadhye,et al. Optimal semi-oblique tiling , 2001, SPAA '01.
[33] G. Kreisel,et al. Elements of Mathematical Logic: Model Theory , 1971 .
[34] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[35] Sanjay V. Rajopadhye,et al. A Geometric Programming Framework for Optimal Multi-Level Tiling , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[36] Duncan H. Lawrie,et al. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.
[37] Mark Alpert. Not Just Fun and Games , 1999 .
[38] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[39] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.
[40] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[41] Sanjay V. Rajopadhye,et al. Optimal Orthogonal Tiling of 2-D Iterations , 1997, J. Parallel Distributed Comput..
[42] Xing Zhou,et al. Hierarchical overlapped tiling , 2012, CGO '12.
[43] Hiroshi Ohta,et al. Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.
[44] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[45] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[46] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.
[47] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[48] Kevin Skadron,et al. Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..
[49] Jingling Xue. Communication-Minimal Tiling of Uniform Dependence Loops , 1997, J. Parallel Distributed Comput..
[50] Viktor K. Prasanna,et al. Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..