Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach
暂无分享,去创建一个
Michael F. P. O'Boyle | Bruno Bodin | Michel Steuwer | Christophe Dubach | Philip Ginsbach | Toomas Remmelg | M. O’Boyle | Bruno Bodin | Philip Ginsbach | Michel Steuwer | Christophe Dubach | Toomas Remmelg
[1] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[2] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[3] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[4] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] J. Ramanujam,et al. A framework for enhancing data reuse via associative reordering , 2014, PLDI.
[6] Keshav Pingali,et al. The program structure tree: computing control regions in linear time , 1994, PLDI '94.
[7] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[8] Lawrence Rauchwerger,et al. An Adaptive Algorithm Selection Framework for Reduction Parallelization , 2006, IEEE Transactions on Parallel and Distributed Systems.
[9] Alvin Cheung,et al. Verified lifting of stencil computations , 2016, PLDI.
[10] Michael Garland,et al. Architecture-Adaptive Code Variant Tuning , 2016, ASPLOS.
[11] Manuel M. T. Chakravarty,et al. Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.
[12] Albert Cohen,et al. Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[13] Trevor L. McDonell. Optimising purely functional GPU programs , 2013, ICFP.
[14] Emilio L. Zapata,et al. A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors , 2000, ICS '00.
[15] Albert Cohen,et al. The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.
[16] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[17] Anders Logg,et al. Unified form language: A domain-specific language for weak formulations of partial differential equations , 2012, TOMS.
[18] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[19] Gautam Gupta. Simplifying reductions , 2006, POPL '06.
[20] Ron Y. Pinter,et al. Program optimization and parallelization using idioms , 1991, POPL '91.
[21] Emilio L. Zapata,et al. An analytical model of locality-based parallel irregular reductions , 2008, Parallel Comput..
[22] Pierre Jouvelot,et al. A unified semantic approach for the vectorization and parallelization of generalized reductions , 1989, ICS '89.
[23] Flemming Nielson,et al. Principles of Program Analysis , 1999, Springer Berlin Heidelberg.
[24] José M. Andión. Compilation techniques for automatic extraction of parallelism and locality in heterogeneous architectures , 2015 .
[25] Toshio Nakatani,et al. Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.
[26] Jason Merrill. Generic and gimple: A new tree represen-tation for entire functions , 2003 .
[27] Franz Franchetti,et al. Operator Language: A Program Generation Framework for Fast Kernels , 2009, DSL.
[28] Martin Odersky,et al. Spiral in scala: towards the systematic construction of generators for performance libraries , 2014, GPCE '13.
[29] José Manuel Andión Fernández. Compilation techniques for automatic extraction of parallelism and locality in heterogeneous architectures , 2015 .
[30] Emilio L. Zapata,et al. Optimization techniques for parallel irregular reductions , 2003, J. Syst. Archit..
[31] Anna Philippou,et al. Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.
[32] Sebastian Hack,et al. Polly's Polyhedral Scheduling in the Presence of Reductions , 2015, ArXiv.
[33] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[34] Albert Cohen,et al. PENCIL Language Specification , 2015 .
[35] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[36] Michael F. P. O'Boyle,et al. Discovery and exploitation of general reductions: A constraint based approach , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[37] Allan L. Fisher,et al. Parallelizing complex scans and reductions , 1994, PLDI '94.
[38] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[39] Kurt Keutzer,et al. Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.
[40] Paul Feautrier,et al. Scheduling reductions , 1994, ICS '94.
[41] Sylvain Paris,et al. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code , 2015, PLDI.
[42] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[43] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[44] Sean Lee,et al. NOVA: A Functional Language for Data Parallelism , 2014, ARRAY@PLDI.
[45] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.
[46] Chi-Chung Lam,et al. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution , 1997, Parallel Process. Lett..
[47] Sam Lindley,et al. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code , 2015, ICFP.
[48] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[49] Sergei Gorlatch,et al. High performance stencil code generation with Lift , 2018, CGO.
[50] Rudolf Eigenmann,et al. Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.
[51] Gagan Agrawal,et al. Porting irregular reductions on heterogeneous CPU-GPU configurations , 2011, 2011 18th International Conference on High Performance Computing.