A performance prediction model for the CUDA GPGPU platform
暂无分享,去创建一个
K. Srinathan | P. J. Narayanan | Kishore Kothapalli | M. Suhail Rehman | P J Narayanan | Suryakant Patidar | Rishabh Mukherjee | K. Srinathan | Suryakant Patidar | M. S. Rehman | Kishore Kothapalli | Rishabh Mukherjee
[1] Hubert Nguyen,et al. GPU Gems 3 , 2007 .
[2] Thomas Ertl,et al. Hardware Accelerated Wavelet Transformations , 2000, VisSym.
[3] David A. Bader,et al. On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[4] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[5] Dinesh Manocha,et al. Cache-efficient numerical algorithms using graphics hardware , 2007, Parallel Comput..
[6] Yossi Matias,et al. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms , 1999, SIAM J. Comput..
[7] Angelos D. Keromytis,et al. CryptoGraphics: Secret Key Cryptography Using Graphics Cards , 2005, CT-RSA.
[8] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[9] N. England,et al. Graphics Hardware , 2019, IEEE Computer Graphics and Applications.
[10] P. J. Narayanan,et al. Scalable Split and Gather Primitives for the GPU , 2009 .
[11] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.
[12] Joseph JáJá,et al. Designing Practical Efficient Algorithms for Symmetric Multiprocessors , 1999, ALENEX.
[13] Hamid Laga,et al. CUDA (Computer Unified Device Architecture) , 2009 .
[14] Joseph JáJá,et al. An Introduction to Parallel Algorithms , 1992 .
[15] Vincent Rijmen,et al. Rijndael/AES , 2005, Encyclopedia of Cryptography and Security.
[16] Ivan Viola,et al. Hardware-based nonlinear filtering and segmentation using high-level shading languages , 2003, IEEE Visualization, 2003. VIS 2003..
[17] P. J. Narayanan,et al. Fast and scalable list ranking on the GPU , 2009, ICS.
[18] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.
[19] Jason Yang,et al. Symmetric Key Cryptography on Modern Graphics Hardware , 2007, ASIACRYPT.
[20] Gary L. Miller,et al. A Simple Randomized Parallel Algorithm for List-Ranking , 1990, Inf. Process. Lett..
[21] David P. Anderson,et al. SETI@home: an experiment in public-resource computing , 2002, CACM.
[22] Richard Cole,et al. Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..
[23] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[24] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[25] P. J. Narayanan,et al. CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[26] Bruce Schneier,et al. Description of a New Variable-Length Key, 64-bit Block Cipher (Blowfish) , 1993, FSE.
[27] Ramani Duraiswami,et al. Canny edge detection on NVIDIA CUDA , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[28] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[29] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[30] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[31] Steven Fortune,et al. Parallelism in random access machines , 1978, STOC.
[32] David Brumley,et al. Remote timing attacks are practical , 2003, Comput. Networks.
[33] John Waldron,et al. Practical Symmetric Key Cryptography on Modern Graphics Hardware , 2008, USENIX Security Symposium.
[34] Paul C. Kocher,et al. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems , 1996, CRYPTO.
[35] Yossi Matias,et al. The Queue-Read Queue-Write Asynchronous PRAM Model , 1996, Theor. Comput. Sci..
[36] Peter Schwabe,et al. Faster and Timing-Attack Resistant AES-GCM , 2009, CHES.
[37] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[38] Vijay S. Pande,et al. Folding@home: Lessons from eight years of volunteer distributed computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[39] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[40] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[41] Emilio L. Zapata,et al. Memory Locality Exploitation Strategies for FFT on the CUDA Architecture , 2008, VECPAR.
[42] David R. Kaeli,et al. Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[43] Naga K. Govindaraju,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .
[44] Michael T. Goodrich,et al. A bridging model for parallel computation, communication, and I/O , 1996, CSUR.
[45] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).