An Effective Fusion and Tile Size Model for PolyMage
暂无分享,去创建一个
[1] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[2] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[3] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[4] Uday Bondhugula,et al. An effective fusion and tile size model for optimizing image processing pipelines , 2018, PPoPP.
[5] Sebastian Hack,et al. Polyhedral expression propagation , 2018, CC.
[6] Xing Zhou,et al. Hierarchical overlapped tiling , 2012, CGO '12.
[7] Pen-Chung Yew,et al. Tile size selection revisited , 2013, ACM Trans. Archit. Code Optim..
[8] Uday Bondhugula,et al. Optimizing geometric multigrid method computation using a DSL approach , 2017, SC.
[9] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[10] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[11] Ken Kennedy,et al. Loop fusion in high performance Fortran , 1998, ICS '98.
[12] Gihan R. Mudalige,et al. Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS , 2017, IEEE Transactions on Parallel and Distributed Systems.
[13] Catherine Mills Olschanowsky,et al. A Study on Balancing Parallelism, Data Locality, and Recomputation in Existing PDE Solvers , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Vivek Sarkar,et al. Analytical Bounds for Optimal Tile Size Selection , 2012, CC.
[15] Catherine Mills Olschanowsky,et al. Transforming loop chains via macro dataflow graphs , 2018, CGO.
[16] Samuel Williams,et al. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid , 2013, 20th Annual International Conference on High Performance Computing.
[17] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[18] Ken Kennedy,et al. Profitable loop fusion and tiling using model-driven empirical search , 2006, ICS '06.
[19] Uday Bondhugula,et al. A model for fusion and code motion in an automatic parallelizing compiler , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Vivek Sarkar,et al. Optimal weighted loop fusion for parallel programs , 1997, SPAA '97.
[21] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[22] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.
[23] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[24] Ken Kennedy,et al. Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion , 2004, Int. J. High Perform. Comput. Appl..