A framework for enhancing data reuse via associative reordering
暂无分享,去创建一个
J. Ramanujam | P. Sadayappan | Fabrice Rastello | Louis-Noël Pouchet | Tobias Grosser | Martin Kong | Kevin Stock | J. Ramanujam | P. Sadayappan | F. Rastello | L. Pouchet | T. Grosser | Kevin Stock | Martin Kong
[1] Keshav Pingali,et al. Exploiting the commutativity lattice , 2011, PLDI '11.
[2] Albert Cohen,et al. Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.
[3] Frédéric Vivien,et al. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..
[4] M. Abramowitz,et al. Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .
[5] Katherine Yelick,et al. Auto-tuning stencil codes for cache-based multicore platforms , 2009 .
[6] Martin C. Rinard,et al. Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.
[7] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[8] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[9] Paul Feautrier,et al. Detection of Recurrences in Sequential Programs with Loops , 1993, PARLE.
[10] Jeffrey W. Banks,et al. Upwind schemes for the wave equation in second-order form , 2012, J. Comput. Phys..
[11] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[12] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[13] Naga K. Govindaraju,et al. Fast scan algorithms on graphics processors , 2008, ICS '08.
[14] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[15] Chun Chen,et al. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters , 2012, The Journal of Supercomputing.
[16] Albert Cohen,et al. Automatic Correction of Loop Transformations , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[17] Nathan Clark,et al. Commutativity analysis for software parallelization: letting program transformations see the big picture , 2009, ASPLOS.
[18] Edwin Hsing-Mean Sha,et al. Optimizing DSP flow graphs via schedule-based multidimensional retiming , 1996, IEEE Trans. Signal Process..
[19] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[20] Yun Zhang,et al. Commutative set: a language extension for implicit parallel programming , 2011, PLDI '11.
[21] José María Cela,et al. Introducing the Semi-stencil Algorithm , 2009, PPAM.
[22] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[23] Linda G. Shapiro,et al. Computer and Robot Vision , 1991 .
[24] Keith D. Cooper,et al. Value-driven redundancy elimination , 1996 .
[25] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[26] Wei Liu,et al. Speculative parallelization of partial reduction variables , 2010, CGO '10.
[27] Uday Bondhugula,et al. Loop transformations: convexity, pruning and optimization , 2011, POPL '11.
[28] Jason Cong,et al. Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.
[29] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.
[30] Yves Robert,et al. Circuit Retiming Applied to Decomposed Software Pipelining , 1998, IEEE Trans. Parallel Distributed Syst..
[31] Soo-Mook Moon,et al. Rotating Register Allocation for Enhanced Pipeline Scheduling , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[32] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[33] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[34] Sanjay V. Rajopadhye,et al. Scan detection and parallelization in "inherently sequential" nested loop programs , 2012, CGO '12.
[35] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[36] Edwin Hsing-Mean Sha,et al. Achieving Full Parallelism Using Multidimensional Retiming , 1996, IEEE Trans. Parallel Distributed Syst..
[37] Guy E. Blelloch,et al. Scans as Primitive Parallel Operations , 1989, ICPP.
[38] Steven J. Deitz,et al. Eliminating redundancies in sum-of-product array computations , 2001, ICS '01.
[39] P. Sadayappan,et al. StVEC: A Vector Instruction Extension for High Performance Stencil Computation , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.