Tiling and optimizing time-iterated computations over periodic domains
暂无分享,去创建一个
[1] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[2] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[3] Keshav Pingali,et al. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.
[4] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[5] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[6] David Parello,et al. Facilitating the search for compositions of program transformations , 2005, ICS '05.
[7] Sanjay V. Rajopadhye,et al. Smashing: Folding Space to Tile through Time , 2008, LCPC.
[8] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[9] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[10] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.
[11] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[13] R. Sadourny. The Dynamics of Finite-Difference Models of the Shallow-Water Equations , 1975 .
[14] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[15] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[16] Sanjay Rajopadhye,et al. Piecewise Linear Schedules For Recurrence Equations , 1992, Workshop on VLSI Signal Processing.
[17] J. Ramanujam,et al. Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..
[18] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[19] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[20] A Thesis,et al. Tiling Stencil Computations to Maximize Parallelism , 2013 .
[21] William Pugh,et al. Iteration space slicing and its application to communication optimization , 1997, ICS '97.
[22] Martin Griebl,et al. Index Set Splitting , 2000, International Journal of Parallel Programming.
[23] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[24] Sven Verdoolaege,et al. An integer set library for program analysis , 2009 .
[25] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.
[26] Leslie Lamport. The Hyperplane Method for an Array Computer , 1974, Sagamore Computer Conference.
[27] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[28] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[29] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.
[30] D. Wonnacott,et al. On the Scalability of Loop Tiling Techniques , 2012 .
[31] Todd D. Ringler,et al. Climate modeling with spherical geodesic grids , 2002, Comput. Sci. Eng..
[32] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[33] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[34] Christian Choffrut,et al. Folding of the Plane and the Design of Systolic Arrays , 1983, Inf. Process. Lett..
[35] P. Feautrier. Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .
[36] References , 1971 .
[37] Hans-Peter Seidel,et al. Cache oblivious parallelograms in iterative stencil computations , 2010, ICS '10.
[38] David K. Smith. Theory of Linear and Integer Programming , 1987 .
[39] Peter R. Cappello,et al. Converting affine recurrence equations to quasi-uniform recurrence equations , 1988, J. VLSI Signal Process..
[40] Hans-Peter Seidel,et al. Cache Accurate Time Skewing in Iterative Stencil Computations , 2011, 2011 International Conference on Parallel Processing.
[41] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.