Model-driven transformations for multi- and many-core CPUs
暂无分享,去创建一个
[1] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[2] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[3] Nicolas Vasilache,et al. Joint Scheduling and Layout Optimization to Enable Multi-Level Vectorization , 2012 .
[4] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[5] Torsten Hoefler,et al. Polly-ACC Transparent compilation to heterogeneous hardware , 2016, ICS.
[6] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[7] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.
[8] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[9] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[10] Vivek Sarkar,et al. Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[12] Vivek Sarkar,et al. Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling , 2018, CC.
[13] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[14] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.
[15] Uday Bondhugula,et al. The Pluto+ Algorithm , 2016, ACM Trans. Program. Lang. Syst..
[16] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[17] Uday Bondhugula,et al. Loop transformations: convexity, pruning and optimization , 2011, POPL '11.
[18] Uday Bondhugula,et al. An effective fusion and tile size model for optimizing image processing pipelines , 2018, PPoPP.
[19] Uday Bondhugula,et al. Combined iterative and model-driven optimization in an automatic parallelization framework , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[21] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[22] FeautrierPaul. Some efficient solutions to the affine scheduling problem , 1992 .
[23] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[24] Uday Bondhugula,et al. Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Benoît Meister,et al. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.
[26] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[27] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[28] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[29] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[30] J. Ramanujam,et al. A framework for enhancing data reuse via associative reordering , 2014, PLDI.
[31] Albert Cohen,et al. The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.
[32] Louis-Noël Pouchet,et al. A Performance Vocabulary for Affine Loop Transformations , 2018, ArXiv.
[33] Sven Verdoolaege,et al. Extending Pluto-Style Polyhedral Scheduling with Consecutivity Sven Verdoolaege , 2018 .