A Memory Model for Scientific Algorithms on Graphics Processors
暂无分享,去创建一个
N.K. Govindaraju | S. Larsen | J. Gray | D. Manocha | J. Gray | D. Manocha | N. Govindaraju | S. Larsen | Dinesh Manocha
[1] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[2] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[3] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[4] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[5] R. Tolimieri,et al. Algorithms for Discrete Fourier Transform and Convolution , 1989 .
[6] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[7] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[8] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.
[9] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[10] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[11] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[12] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[13] PingaliKeshav,et al. Data-centric multi-level blocking , 1997 .
[14] Anoop Gupta,et al. The Design and Analysis of a Cache Architecture for Texture Mapping , 1997, ISCA.
[15] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[16] Sandeep Sen,et al. Towards a theory of cache-efficient algorithms , 2000, SODA '00.
[17] Jeffrey Scott Vitter,et al. External memory algorithms and data structures: dealing with massive data , 2001, CSUR.
[18] David K. McAllister,et al. Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[19] Martin Rumpf,et al. Using Graphics Cards for Quantized FEM Computations , 2001, VIIP.
[20] Anselmo Lastra,et al. Simulation of cloud dynamics on graphics hardware , 2003, HWWS '03.
[21] Pat Hanrahan,et al. Photon mapping on programmable graphics hardware , 2003, HWWS '03.
[22] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, ACM Trans. Graph..
[23] Ming C. Lin,et al. Visual simulation of ice crystal growth , 2003, SCA '03.
[24] GrinspunEitan,et al. Sparse matrix solvers on the GPU , 2003 .
[25] Michael D. McCool,et al. Shader algebra , 2004, ACM Trans. Graph..
[26] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[27] Rüdiger Westermann,et al. UberFlow: a GPU-based particle engine , 2004, SIGGRAPH '04.
[28] Lars Arge,et al. Cache-Oblivious Data Structures , 2004, Handbook of Data Structures and Applications.
[29] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[30] Arie E. Kaufman,et al. GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[31] Dinesh Manocha,et al. Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.
[32] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[33] Rüdiger Westermann,et al. Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.
[34] Dinesh Manocha,et al. LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[35] Dinesh Manocha,et al. Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.
[36] Dinesh Manocha,et al. GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.