Optimising purely functional GPU programs
暂无分享,去创建一个
[1] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[3] Conal Elliott,et al. Programming graphics processors functionally , 2004, Haskell '04.
[4] Simon L. Peyton Jones,et al. Regular, shape-polymorphic, parallel arrays in Haskell , 2010, ICFP '10.
[5] Simon L. Peyton Jones,et al. Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell , 1999, IFL.
[6] Simon L. Peyton Jones,et al. Exploiting vector instructions with generalized stream fusio , 2013, ICFP.
[7] Guy E. Blelloch,et al. NESL: A Nested Data-Parallel Language , 1992 .
[8] Kunle Olukotun,et al. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging , 2013, POPL.
[9] Andy Gill,et al. Type-safe observable sharing in Haskell , 2009, Haskell.
[10] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[11] Jos Stam,et al. Stable fluids , 1999, SIGGRAPH.
[12] Bo Joel Svensson,et al. Expressive array constructs in an embedded GPU kernel programming language , 2012, DAMP '12.
[13] J. Gregory Morrisett,et al. Nikola: embedding compiled GPU functions in Haskell , 2010, Haskell '10.
[14] Guy E. Blelloch,et al. Prefix sums and their applications , 1990 .
[15] Hideya Iwasaki,et al. A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming , 2009, APLAS.
[16] Simon L. Peyton Jones,et al. Guiding parallel array fusion with indexed types , 2012, Haskell '12.
[17] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[18] Guy E. Blelloch,et al. Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.
[19] Manuel M. T. Chakravarty,et al. On the Distribution Implementation of Aggregate Data Structures by Program Transformation , 1999, IPPS/SPDP Workshops.
[20] Lars Bergstrom,et al. Nested data-parallelism on the gpu , 2012, ICFP 2012.
[21] Emil Axelsson. A generic abstract syntax model for embedded languages , 2012, ICFP '12.
[22] Gagan Agrawal,et al. An integer programming framework for optimizing shared memory use on GPUs , 2010, 2010 International Conference on High Performance Computing.
[23] Kiminori Matsuzaki,et al. Implementing Fusion-Equipped Parallel Skeletons by Expression Templates , 2009, IFL.
[24] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[25] Manuel M. T. Chakravarty,et al. Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.
[26] Roman Leshchinskiy,et al. Stream fusion: from lists to streams to nothing at all , 2007, ICFP '07.
[27] Mary Sheeran,et al. Obsidian: GPU Programming in Haskell , 2011 .
[28] Robert Atkey,et al. Unembedding domain-specific languages , 2009, Haskell.
[29] Gabriele Keller,et al. Efficient parallel stencil convolution in Haskell , 2011, Haskell '11.
[30] Maarten M. Fokkinga,et al. Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire , 1991, FPCA.
[31] Simon L. Peyton Jones,et al. Secrets of the Glasgow Haskell Compiler inliner , 2002, Journal of Functional Programming.
[32] Bradford Larsen,et al. Simple optimizations for an applicative array language for graphics processors , 2011, DAMP '11.
[33] Christoph-Simon Senjak. Haskell Beats C Using Generalized Stream Fusion , 2013 .
[34] Simon L. Peyton Jones,et al. A short cut to deforestation , 1993, FPCA '93.