Parallel associative reductions in Halide
暂无分享,去创建一个
[1] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[2] Paul Feautrier,et al. Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.
[3] Jonathan T. Barron,et al. Burst photography for high dynamic range and low-light imaging on mobile cameras , 2016, ACM Trans. Graph..
[4] Richard Kenner,et al. Eliminating branches using a superoptimizer and the GNU C compiler , 1992, PLDI '92.
[5] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[6] Alexander Aiken,et al. Stochastic superoptimization , 2012, ASPLOS '13.
[7] Armando Solar-Lezama,et al. MSL: A Synthesis Enabled Language for Distributed Implementations , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Emina Torlak,et al. Growing solver-aided languages with rosette , 2013, Onward!.
[9] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[10] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[11] Albert Cohen,et al. Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[12] Aws Albarghouthi,et al. MapReduce program synthesis , 2016, PLDI.
[13] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[14] Wei-Ngan Chin,et al. Deriving efficient parallel programs for complex recurrences , 1997, PASCO '97.
[15] Dinakar Dhurjati,et al. Scaling up Superoptimization , 2016 .
[16] Nikolaj Bjørner,et al. Z3: An Efficient SMT Solver , 2008, TACAS.
[17] Henry Massalin. Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.
[18] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[19] Akimasa Morihata,et al. Automatic inversion generates divide-and-conquer parallel programs , 2007, PLDI '07.
[20] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[21] John Regehr,et al. Provably correct peephole optimizations with alive , 2015, PLDI.
[22] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[23] Armando Solar-Lezama,et al. Program synthesis by sketching , 2008 .