Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs
暂无分享,去创建一个
[1] Jacques Carette,et al. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages , 2007, Journal of Functional Programming.
[2] Thomas B. Jablin,et al. Automatic execution of single-GPU computations across multiple GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[3] Ryan Newton,et al. Design and evaluation of a compiler for embedded stream programs , 2008, LCTES '08.
[4] Kenneth E. Iverson,et al. A programming language , 1899, AIEE-IRE '62 (Spring).
[5] Ryan Newton,et al. Freeze after writing: quasi-deterministic parallel programming with LVars , 2014, POPL.
[6] Andy Gill,et al. Type-safe observable sharing in Haskell , 2009, Haskell.
[7] Kurt Keutzer,et al. Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.
[8] Kunle Olukotun,et al. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.
[9] Sebastian Burckhardt,et al. Two for the price of one: a model for parallel and incremental computation , 2011, OOPSLA '11.
[10] Sudhakar Yalamanchili,et al. Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[11] J. Gregory Morrisett,et al. Nikola: embedding compiled GPU functions in Haskell , 2010, Haskell '10.
[12] Roman Leshchinskiy,et al. Stream fusion: from lists to streams to nothing at all , 2007, ICFP '07.
[13] Elizabeth R. Jessup,et al. Build to order linear algebra kernels , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[14] Guy E. Blelloch,et al. Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.
[15] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[16] Amr Sabry,et al. The essence of compiling with continuations , 1993, PLDI '93.
[17] Emil Axelsson. A generic abstract syntax model for embedded languages , 2012, ICFP '12.
[18] Manuel M. T. Chakravarty,et al. Embedding Foreign Code , 2014, PADL.
[19] Sergei Gorlatch,et al. Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[20] Guy E. Blelloch,et al. A provable time and space efficient implementation of NESL , 1996, ICFP '96.
[21] Emery D. Berger,et al. Dthreads: efficient deterministic multithreading , 2011, SOSP.
[22] Hiroshi Nakamura,et al. Integrating Multi-GPU Execution in an OpenACC Compiler , 2013, 2013 42nd International Conference on Parallel Processing.
[23] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[24] Guy E. Blelloch,et al. Vector Models for Data-Parallel Computing , 1990 .
[25] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[26] R. Govindarajan,et al. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.
[27] Michael D. McCool,et al. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[28] Manuel M. T. Chakravarty,et al. Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.
[29] Trevor L. McDonell. Optimising purely functional GPU programs , 2013, ICFP.
[30] Ryan Newton,et al. A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud , 2012, ICFP.
[31] José M. F. Moura,et al. Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..
[32] Tao Yang,et al. List Scheduling With and Without Communication Delays , 1993, Parallel Comput..
[33] Scott A. Mahlke,et al. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[34] Guy E. Blelloch,et al. NESL: A Nested Data-Parallel Language , 1992 .
[35] Bo Joel Svensson,et al. Expressive array constructs in an embedded GPU kernel programming language , 2012, DAMP '12.
[36] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[37] Jack Dongarra,et al. Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.
[38] Simon L. Peyton Jones,et al. Regular, shape-polymorphic, parallel arrays in Haskell , 2010, ICFP '10.
[39] Christoph W. Kessler,et al. SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.
[40] Matthias Felleisen,et al. Semantics Engineering with PLT Redex , 2009 .
[41] Robert Atkey,et al. Unembedding domain-specific languages , 2009, Haskell.