暂无分享,去创建一个
Shoaib Kamil | Emanuele Del Sozzo | Riyadh Baghdadi | Saman P. Amarasinghe | Patricia Suriana | Jessica Ray | Malek Ben Romdhane | S. Kamil | Patricia Suriana | Riyadh Baghdadi | Jessica Ray | Shoaib Kamil
[1] P. Feautrier. Array expansion , 1988 .
[2] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[3] Frédo Durand,et al. Compiling high performance recursive filters , 2015, HPG '15.
[4] Pat Hanrahan,et al. Darkroom , 2014, ACM Trans. Graph..
[5] Feng Li,et al. Elimination of memory-based dependences for loop-nest optimization and parallelization , 2011 .
[6] Albert Cohen,et al. The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.
[7] Wojciech Matusik,et al. Simit , 2016, ACM Trans. Graph..
[8] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[9] Gordon L. Kindlmann,et al. Diderot: a parallel DSL for image analysis and visualization , 2012, PLDI.
[10] Frédéric Vivien,et al. A unified framework for schedule and storage optimization , 2001, PLDI '01.
[11] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[12] Albert Cohen,et al. Violated dependence analysis , 2006, ICS '06.
[13] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[14] Xuan Yang,et al. Programming Heterogeneous Systems from an Image Processing DSL , 2016, ACM Trans. Archit. Code Optim..
[15] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[16] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[17] Monica S. Lam,et al. Array-data flow analysis and its use in array privatization , 1993, POPL '93.
[18] Monica S. Lam,et al. Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.
[19] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.
[20] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[21] Albert Cohen,et al. Parallelization via Constrained Storage Mapping Optimization , 1999, ISHPC.
[22] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[23] Alain Darte,et al. New Complexity Results on Array Contraction and Related Problems , 2005, J. VLSI Signal Process..
[24] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[25] Albert Cohen,et al. Optimization of storage mappings for parallel programs , 1988 .
[26] David A. Padua,et al. Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.
[27] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[28] Uday Bondhugula,et al. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[29] Anders Logg,et al. Unified form language: A domain-specific language for weak formulations of partial differential equations , 2012, TOMS.
[30] Hari Angepat,et al. Configurable Clouds , 2017, IEEE Micro.
[31] Paul Feautrier,et al. Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..
[32] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[33] Guy E. Blelloch,et al. Space profiling for parallel functional programs , 2008, ICFP.
[34] Sanjay V. Rajopadhye,et al. Optimizing memory usage in the polyhedral model , 2000, TOPL.
[35] Guy E. Blelloch,et al. Cache and I/O efficent functional algorithms , 2013, POPL.
[36] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[37] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[38] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[39] Wei Huang,et al. Design of High Performance MVAPICH2: MPI2 over InfiniBand , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[40] Albert Cohen,et al. PENCIL Language Specification , 2015 .
[41] Mary W. Hall,et al. Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[42] Zhiyuan Li. Array privatization for parallel execution of loops , 1992, ICS.
[43] Marco D. Santambrogio,et al. A Common Backend for Hardware Acceleration on FPGA , 2017, 2017 IEEE International Conference on Computer Design (ICCD).
[44] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[45] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[46] Shoaib Kamil,et al. Distributed Halide , 2016, PPoPP.
[47] Manish Gupta,et al. On privatization of variables for data-parallel execution , 1997, Proceedings 11th International Parallel Processing Symposium.
[48] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[49] Sean Lee,et al. NOVA: A Functional Language for Data Parallelism , 2014, ARRAY@PLDI.
[50] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[51] Mikel Luján,et al. OoLALA: an object oriented analysis and design of numerical linear algebra , 2000, OOPSLA '00.
[52] Canqun Yang,et al. MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.
[53] Chao-Tung Yang,et al. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..