Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
暂无分享,去创建一个
Shoaib Kamil | Emanuele Del Sozzo | Riyadh Baghdadi | Yunming Zhang | Saman P. Amarasinghe | Patricia Suriana | Jessica Ray | Malek Ben Romdhane | Abdurrahman Akkas | Yunming Zhang | S. Kamil | Patricia Suriana | Riyadh Baghdadi | Jessica Ray | Abdurrahman Akkas | Shoaib Kamil
[1] Paul Feautrier,et al. Polyhedron Model , 2011, Encyclopedia of Parallel Computing.
[2] Sanjay V. Rajopadhye,et al. Optimizing memory usage in the polyhedral model , 2000, TOPL.
[3] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[4] Alain Darte,et al. New Complexity Results on Array Contraction and Related Problems , 2005, J. VLSI Signal Process..
[5] Lawrence G. Roberts,et al. Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.
[6] William Detmold,et al. Nuclear correlation functions in lattice QCD , 2012, 1207.1452.
[7] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[8] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[9] Monica S. Lam,et al. Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.
[10] Manish Gupta,et al. On privatization of variables for data-parallel execution , 1997, Proceedings 11th International Parallel Processing Symposium.
[11] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[12] Monica S. Lam,et al. Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.
[13] Wei Huang,et al. Design of High Performance MVAPICH2: MPI2 over InfiniBand , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[14] Albert Cohen,et al. PENCIL Language Specification , 2015 .
[15] Sean Lee,et al. NOVA: A Functional Language for Data Parallelism , 2014, ARRAY@PLDI.
[16] Zhiyuan Li. Array privatization for parallel execution of loops , 1992, ICS.
[17] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[18] Marco D. Santambrogio,et al. A Unified Backend for Targeting FPGAs from DSLs , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[19] Albert Cohen,et al. The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.
[20] Shoaib Kamil,et al. GraphIt: a high-performance graph DSL , 2018, Proc. ACM Program. Lang..
[21] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[22] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[23] Monica S. Lam,et al. Array-data flow analysis and its use in array privatization , 1993, POPL '93.
[24] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[25] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[26] Frédéric Vivien,et al. A unified framework for schedule and storage optimization , 2001, PLDI '01.
[27] Uday Bondhugula. Compiling affine loop nests for distributed-memory parallel architectures , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[28] Tomofumi Yuki,et al. AlphaZ: A System for Design Space Exploration in the Polyhedral Model , 2012, LCPC.
[29] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[30] Paul Feautrier,et al. Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..
[31] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[32] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[33] Uday Bondhugula,et al. Loop transformations: convexity, pruning and optimization , 2011, POPL '11.
[34] David A. Padua,et al. Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.
[35] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[36] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[37] Henk Corporaal,et al. Extending Halide to Improve Software Development for Imaging DSPs , 2017, TACO.
[38] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[39] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[40] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[41] Paul Feautrier,et al. Array expansion , 1988, ICS '88.
[42] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.
[43] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[44] Shoaib Kamil,et al. Distributed Halide , 2016, PPoPP.
[45] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[46] Albert Cohen,et al. Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.
[47] Albert Cohen,et al. GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation , 2010 .
[48] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).