Automatically harnessing sparse acceleration
暂无分享,去创建一个
Michael F. P. O'Boyle | Bruce Collie | Philip Ginsbach | M. O’Boyle | Philip Ginsbach | Bruce Collie
[1] Paolo Bientinesi,et al. Program generation for small-scale linear algebra applications , 2018, CGO.
[2] Victor Eijkhout,et al. An iterative solver benchmark , 2001, Sci. Program..
[3] Arturo González-Escribano,et al. Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming , 2019, International Journal of Parallel Programming.
[4] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[5] Sharon L. Wolchik. 1989 , 2009 .
[6] Y. Saad,et al. Krylov Subspace Methods on Supercomputers , 1989 .
[7] Albert Cohen,et al. A polyhedral compilation framework for loops with dynamic data-dependent bounds , 2018, CC.
[8] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[9] Sebastian Hack,et al. Polly's Polyhedral Scheduling in the Presence of Reductions , 2015, ArXiv.
[10] J. Doye,et al. THE DOUBLE-FUNNEL ENERGY LANDSCAPE OF THE 38-ATOM LENNARD-JONES CLUSTER , 1998, cond-mat/9808265.
[11] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[12] Markus Püschel,et al. A Basic Linear Algebra Compiler , 2014, CGO '14.
[13] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[14] Toshio Nakatani,et al. Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.
[15] Cristina V. Lopes. SIGPLAN treasurer's report , 2013, SIGP.
[16] D. Wales. Discrete path sampling , 2002 .
[17] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[18] Yosi Ben-Asher,et al. Streamlining Whole Function Vectorization in C Using Higher Order Vector Semantics , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[19] David A. Ham,et al. An Algorithm for the Optimization of Finite Element Integration Loops , 2016, ACM Trans. Math. Softw..
[20] Yi Yang,et al. BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing , 2015, ICS.
[21] Pierre Jouvelot,et al. A unified semantic approach for the vectorization and parallelization of generalized reductions , 1989, ICS '89.
[22] Michael F. P. O'Boyle,et al. CAnDL: a domain specific language for compiler analysis , 2018, CC.
[23] Dan Grossman. SIGPLAN education board and related activities report , 2011 .
[24] Gabriel Rodríguez,et al. Generating piecewise-regular code from irregular structures , 2019, PLDI.
[25] Sylvain Paris,et al. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code , 2015, PLDI.
[26] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[27] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[28] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[29] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[30] Nectarios Koziris,et al. SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms , 2018, ACM Trans. Math. Softw..
[31] J. Demmel,et al. Sun Microsystems , 1996 .
[32] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[33] J. Ramanujam,et al. A framework for enhancing data reuse via associative reordering , 2014, PLDI.
[34] Allan L. Fisher,et al. Parallelizing complex scans and reductions , 1994, PLDI '94.
[35] Alvin Cheung,et al. Verified lifting of stencil computations , 2016, PLDI.
[36] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[37] David J. Wales,et al. Exploiting sparsity in free energy basin-hopping , 2017 .
[38] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[39] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[40] Michael F. P. O'Boyle,et al. Type-Directed Program Synthesis and Constraint Generation for Library Portability , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[41] Paul Feautrier,et al. Scheduling reductions , 1994, ICS '94.
[42] Philippe Clauss,et al. The Polyhedral Model of Nonlinear Loops , 2016, ACM Trans. Archit. Code Optim..
[43] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[44] Arturo González-Escribano,et al. Supporting the Xeon Phi Coprocessor in a Heterogeneous Programming Model , 2017, Euro-Par.
[45] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[46] Rudolf Eigenmann,et al. Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.
[47] Tomofumi Yuki,et al. Sparse computation data dependence simplification for efficient compiler-generated inspectors , 2019, PLDI.
[48] Ron Y. Pinter,et al. Program optimization and parallelization using idioms , 1991, POPL '91.
[49] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[50] Michael F. P. O'Boyle,et al. Portable and Transparent Host-Device Communication Optimization for GPGPU Environments , 2014, CGO '14.
[51] Chi-Chung Lam,et al. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution , 1997, Parallel Process. Lett..
[52] Michael F. P. O'Boyle,et al. Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach , 2018, ASPLOS.
[53] Shoaib Kamil,et al. Parallel associative reductions in Halide , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[54] David A. Bader,et al. Graphs, Matrices, and the GraphBLAS: Seven Good Reasons , 2015, ICCS.
[55] Albert Cohen,et al. Vapor SIMD: Auto-vectorize once, run everywhere , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[56] Gautam Gupta. Simplifying reductions , 2006, POPL '06.