暂无分享,去创建一个
Michel Steuwer | Christophe Dubach | Christian Fensch | Michel Steuwer | Christophe Dubach | Christian Fensch
[1] Kurt Keutzer,et al. Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.
[2] Roman Leshchinskiy,et al. Stream fusion: from lists to streams to nothing at all , 2007, ICFP '07.
[3] Abhishek Udupa,et al. Software Pipelined Execution of Stream Programs on GPUs , 2009, 2009 International Symposium on Code Generation and Optimization.
[4] Kunle Olukotun,et al. Implementing Domain-Specific Languages for Heterogeneous Parallel Computing , 2011, IEEE Micro.
[5] Markus Püschel,et al. Bandit-based optimization on graphs with application to library performance tuning , 2009, ICML '09.
[6] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[7] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[8] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[9] James Reinders,et al. Intel® threading building blocks , 2008 .
[10] Murray Cole,et al. Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .
[11] Thomas B. Jablin,et al. Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing , 2014, PPoPP '14.
[12] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[13] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[14] Sean Lee,et al. NOVA: A Functional Language for Data Parallelism , 2014, ARRAY@PLDI.
[15] Simon Peyton Jones,et al. Playing by the rules: rewriting as a practical optimisation technique in GHC , 2001 .
[16] Frank Mueller,et al. Hidp: A hierarchical data parallel language , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[17] Tarek S. Abdelrahman,et al. hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.
[18] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[19] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[20] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[21] Scott A. Mahlke,et al. Sponge: portable stream programming on graphics engines , 2011, ASPLOS XVI.
[22] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.
[23] Sergei Gorlatch,et al. SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[24] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[25] Francisco de Sande,et al. accULL: An OpenACC Implementation with CUDA and OpenCL Support , 2012, Euro-Par.
[26] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[27] Albert Cohen,et al. Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.
[28] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[29] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[30] Sunita Chandrasekaran,et al. Reduction Operations in Parallel Loops for GPGPUs , 2014, PMAM'14.
[31] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[32] Vijay Saraswat,et al. GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.
[33] Franz Franchetti,et al. Operator Language: A Program Generation Framework for Fast Kernels , 2009, DSL.
[34] Sriram Krishnamoorthy,et al. On the Use of Term Rewriting for Performance Optimization of Legacy HPC Applications , 2012, 2012 41st International Conference on Parallel Processing.
[35] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[36] Weng-Fai Wong,et al. Scalable framework for mapping streaming applications onto multi-GPU systems , 2012, PPoPP '12.
[37] Markus Püschel,et al. A Basic Linear Algebra Compiler , 2014, CGO '14.
[38] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[39] Basilio B. Fraguela,et al. An Algorithm Template for Domain-Based Parallel Irregular Algorithms , 2014, International Journal of Parallel Programming.
[40] Manuel M. T. Chakravarty,et al. Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.
[41] Trevor L. McDonell. Optimising purely functional GPU programs , 2013, ICFP.