Padding free bank conflict resolution for CUDA-based matrix transpose algorithm
暂无分享,去创建一个
Mayez A. Al-Mouhamed | Abdulrahman Baqais | A. Khan | A. Fatayar | Anas Almousa | Mohammed Assayony | M. Al-Mouhamed | A. Khan | A. Baqais | M. Assayony | Allam Fatayar | A. Almousa
[1] Wei Lin,et al. Four styles of parallel and net programming , 2009, Frontiers of Computer Science in China.
[2] Yooseong Kim,et al. CuMAPz: A tool to analyze memory access patterns in CUDA , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[3] Jack J. Dongarra,et al. Optimizing symmetric dense matrix-vector multiplication on GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[4] Jizhou Sun,et al. Auto-Tuning of Thread Assignment for Matrix-Vector Multiplication on GPUs , 2013, IEICE Trans. Inf. Syst..
[5] Yifeng Chen,et al. Large-scale FFT on GPU clusters , 2010, ICS '10.
[6] Yong Tang,et al. Gregex: GPU Based High Speed Regular Expression Matching Engine , 2011, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.
[7] Adam Moravanszky,et al. Dense Matrix Algebra on the GPU , 2011 .
[8] Koji Nakano. Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models , 2013, IEICE Trans. Inf. Syst..
[9] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] D. K. Bogolepov,et al. Simplified photon mapping for real-time caustics rendering , 2011, Programming and Computer Software.
[11] Jaeyoung Choi,et al. Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers , 1995, Parallel Comput..
[12] José Aguilar. Heuristic algorithm based on a genetic algorithm for mapping parallel programs on hypercube multiprocessors , 2003, Comput. Syst. Sci. Eng..
[13] Satoshi Matsuoka,et al. An efficient, model-based CPU-GPU heterogeneous FFT library , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[14] R. Deriche. Recursively Implementing the Gaussian and its Derivatives , 1993 .
[15] Vladimir A. Frolov,et al. Biased solution of integral illumination equation via irradiance caching and path tracing on GPUs , 2011, Programming and Computer Software.
[16] Justin P. Haldar,et al. Accelerating advanced mri reconstructions on gpus , 2008, CF '08.
[17] I. N. Skopin,et al. A method for solving mass point-in-covering problems for arbitrary coverings using GPU , 2013, Programming and Computer Software.
[18] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[19] Depei Qian,et al. Challenges and possible approaches: towards the petaflops computers , 2009, Frontiers of Computer Science in China.