Declarative Loop Tactics for Domain-specific Optimization
暂无分享,去创建一个
Henk Corporaal | Tobias Grosser | Oleksandr Zinenko | Lorenzo Chelini | O. Zinenko | H. Corporaal | T. Grosser | Lorenzo Chelini
[1] Hal Finkel,et al. A Proposal for Loop-Transformation Pragmas , 2018, IWOMP.
[2] Sven Verdoolaege,et al. Schedule Trees , 2013 .
[3] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.
[4] Michael Wolfe,et al. Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.
[5] Juan Touriño,et al. XARK: An extensible framework for automatic recognition of computational kernels , 2008, TOPL.
[6] Tomofumi Yuki,et al. AlphaZ: A System for Design Space Exploration in the Polyhedral Model , 2012, LCPC.
[7] Michael Kruse,et al. High-Performance Generalized Tensor Operations , 2018, ACM Trans. Archit. Code Optim..
[8] Gabe Rudy,et al. CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation , 2010 .
[9] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[10] Sven Verdoolaege,et al. Polyhedral Extraction Tool , 2012 .
[11] G. Smith,et al. Numerical Solution of Partial Differential Equations: Finite Difference Methods , 1978 .
[12] Ron Y. Pinter,et al. Program optimization and parallelization using idioms , 1991, POPL '91.
[13] Zhaofang Wen,et al. Automatic Algorithm Recognition and Replacement: A New Approach to Program Optimization , 2000 .
[14] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[15] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[16] Gerda Janssens,et al. Scheduling for PPCG , 2017 .
[17] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[18] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[19] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[20] C.W. Kessler,et al. The SPARAMAT approach to automatic comprehension of sparse matrix computations , 1999, Proceedings Seventh International Workshop on Program Comprehension.
[21] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[22] Allen Taflove,et al. Finite‐Difference Time‐Domain Analysis , 2005 .
[23] FerdmanMichael,et al. Architectural Support for Dynamic Linking , 2015 .
[24] Hongbin Zheng,et al. Polly – Polyhedral optimization in LLVM , 2012 .
[25] Toshio Nakatani,et al. Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.
[26] Louis-Noël Pouchet,et al. Model-driven transformations for multi- and many-core CPUs , 2019, PLDI.
[27] W. Pugh,et al. A framework for unifying reordering transformations , 1993 .
[28] Jacobi. Pattern Driven Automatic Parallelization , 2004 .
[29] Kunle Olukotun,et al. Composition and Reuse with Compiled Domain-Specific Languages , 2013, ECOOP.
[30] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[31] Beniamino Di Martino,et al. PAP Recognizer: a tool for automatic recognition of parallelizable patterns , 1996, WPC '96. 4th Workshop on Program Comprehension.
[32] Jacqueline Chame,et al. A script-based autotuning compiler system to generate high-performance CUDA code , 2013, TACO.
[33] Cédric Bastoul,et al. Opening polyhedral compiler's black box , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[34] Michael Ferdman,et al. Architectural Support for Dynamic Linking , 2015, ASPLOS.
[35] Paul Feautrier,et al. Polyhedron Model , 2011, Encyclopedia of Parallel Computing.
[36] Rudolf Eigenmann,et al. Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.
[37] David A. Padua,et al. Locus: A System and a Language for Program Optimization , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[38] Nicolas Vasilache,et al. GRAPHITE : Polyhedral Analyses and Optimizations for GCC , 2006 .
[39] Wojtek Kozaczynski,et al. Program Concept Recognition and Transformation , 1992, IEEE Trans. Software Eng..
[40] Kunle Olukotun,et al. Forge: generating a high performance DSL implementation from a declarative specification , 2013, GPCE '13.
[41] Vivek Sarkar,et al. Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling , 2018, CC.
[42] Christoph W. Kessler,et al. Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization , 2012, International Journal of Parallel Programming.
[43] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[44] Albert Cohen,et al. Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.
[45] Jeffrey S. Vetter,et al. NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[46] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[47] Uday Bondhugula,et al. A model for fusion and code motion in an automatic parallelizing compiler , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[48] Sven Verdoolaege. Counting Affine Calculator and Applications , 2011 .
[49] J. Ramanujam,et al. Optimistic Delinearization of Parametrically Sized Arrays , 2015, ICS.
[50] Eelco Visser,et al. Stratego/XT 0.17. A language and toolset for program transformation , 2008, Sci. Comput. Program..
[51] William Pugh,et al. Static analysis of upper and lower bounds on dependences and parallelism , 1994, TOPL.
[52] Lawrence Rauchwerger,et al. Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.
[53] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[54] Uday Bondhugula,et al. PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System , 2015 .
[55] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[56] Allen Taflove,et al. Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .
[57] Qing Yi,et al. POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..
[58] Tze Meng Low,et al. Analytical Modeling Is Enough for High-Performance BLIS , 2016, ACM Trans. Math. Softw..
[59] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.