Cache-efficient numerical algorithms using graphics hardware
暂无分享,去创建一个
[1] R. Yavne. An Economical Method for Calculating the Discrete Fourier Transform , 1899 .
[2] Michael D. McCool,et al. Shader algebra , 2004, ACM Trans. Graph..
[3] Rin-ichiro Taniguchi,et al. Real-time image processing on IEEE1394-based PC cluster , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[4] H. T. Kung,et al. Sorting on a mesh-connected parallel computer , 1977, CACM.
[5] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[6] Kenneth Moreland,et al. The FFT on a GPU , 2003, HWWS '03.
[7] S. Winograd. On computing the Discrete Fourier Transform. , 1976, Proceedings of the National Academy of Sciences of the United States of America.
[8] Murray Cole,et al. Algorithmic Skeletons , 2006, Research Directions in Parallel Functional Programming.
[9] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[10] David H. Bailey. A High-Performance FFT Algorithm for Vector Supercomputers , 1987, PPSC.
[11] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[12] Takashi Matsuyama,et al. Real-time active 3D shape reconstruction for 3D video , 2003, 3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the.
[13] Michael E. Saks,et al. The periodic balanced sorting network , 1989, JACM.
[14] Joel Falcou,et al. An object oriented SIMD library. , 2005 .
[15] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[16] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.
[17] Dinesh Manocha,et al. GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.
[18] Bruno Raffin,et al. A Distributed Approach for Real Time 3D Modeling , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.
[19] Bryan Chan,et al. Shader algebra , 2004, SIGGRAPH 2004.
[20] David K. McAllister,et al. Fast matrix multiplies using graphics hardware , 2001, SC.
[21] Anoop Gupta,et al. The Design and Analysis of a Cache Architecture for Texture Mapping , 1997, ISCA.
[22] Pat Hanrahan,et al. Photon mapping on programmable graphics hardware , 2003, HWWS '03.
[23] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[24] Dinesh Manocha,et al. Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.
[25] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[26] Daniel B. Horn,et al. Assessment of Graphic Processing Units (GPUs) for Department of Defense (DoD) Digi , 2005 .
[27] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[28] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[29] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[30] David H. Bailey. A high-performance fast Fourier transform algorithm for the Cray-2 , 2004, The Journal of Supercomputing.
[31] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.
[32] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[33] David H. Bailey,et al. FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[34] Clifford Stein,et al. Introduction to Algorithms, 2nd edition. , 2001 .
[35] Rüdiger Westermann,et al. UberFlow: a GPU-based particle engine , 2004, SIGGRAPH '04.
[36] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[37] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[38] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.
[39] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[40] Joel Falcou,et al. E.V.E., An Object Oriented SIMD Library , 2005, Scalable Comput. Pract. Exp..
[41] A. Verri,et al. A compact algorithm for rectification of stereo pairs , 2000 .
[42] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[43] R. Tolimieri,et al. Algorithms for Discrete Fourier Transform and Convolution , 1989 .
[44] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[45] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[46] Richard E. Ladner,et al. The influence of caches on the performance of sorting , 1997, SODA '97.