Parallel Tiled Code Generation with Loop Permutation within Tiles
暂无分享,去创建一个
[1] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[2] Marek Palkowski,et al. Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph , 2014, ACS.
[3] William Pugh,et al. Iteration space slicing and its application to communication optimization , 1997, ICS '97.
[4] Marek Palkowski,et al. TRACO: An automatic loop nest parallelizer for numerical applications , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).
[5] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[6] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[7] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[8] J. Ramanujam,et al. Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..
[9] Wlodzimierz Bielecki,et al. Using Basis Dependence Distance Vectors to Calculate the Transitive Closure of Dependence Relations by Means of the Floyd-Warshall Algorithm , 2013, COCOA.
[10] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.
[11] Martin Griebl,et al. Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .
[12] Albert Cohen,et al. Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, ISPDC.
[13] Uday Bondhugula,et al. Tiling for Dynamic Scheduling , 2014 .
[14] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[15] Anna Beletska,et al. An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations , 2010, COCOA.
[16] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[17] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.
[18] William Pugh,et al. Transitive Closure of Infinite Graphs and its Applications , 1995, Int. J. Parallel Program..
[19] Marek Palkowski,et al. Free scheduling for statement instances of parameterized arbitrarily nested affine loops , 2012, Parallel Comput..
[20] Marek Palkowski,et al. Free Scheduling of Tiles Based on the Transitive Closure of Dependence Graphs , 2015, PPAM.
[21] William Pugh,et al. The Omega Library interface guide , 1995 .
[22] Monica S. Lam,et al. Communication-Free Parallelization via Affine Transformations , 1994, LCPC.
[23] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[24] G. Shipman,et al. Omega Library , 2011, Encyclopedia of Parallel Computing.
[25] Albert Cohen,et al. Transitive Closures of Affine Integer Tuple Relations and Their Overapproximations , 2011, SAS.
[26] Uday Bondhugula,et al. Effective automatic parallelization and locality optimization using the polyhedral model , 2008 .
[27] D. Wonnacott,et al. On the Scalability of Loop Tiling Techniques , 2012 .
[28] David Wonnacott,et al. Automatic Tiling of “ Mostly-Tileable ” Loop Nests , 2014 .