Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs
暂无分享,去创建一个
Henri Casanova | John Iacono | Nodari Sitchinava | Ben Karsin | Volker Weichert | J. Iacono | H. Casanova | Ben Karsin | Volker Weichert | Nodari Sitchinava
[1] K. Srinathan,et al. A performance prediction model for the CUDA GPGPU platform , 2009, 2009 International Conference on High Performance Computing (HiPC).
[2] Jatin Goyal,et al. Parallel binary search trees for rapid IP lookup using graphic processors , 2013, 2013 2nd International Conference on Information Management in the Knowledge Economy.
[3] Frank Dehne,et al. Deterministic Sample Sort for GPUs , 2010, Parallel Process. Lett..
[4] Henri Casanova,et al. Efficient Batched Predecessor Search in Shared Memory on GPUs , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).
[5] Koji Nakano,et al. Simple Memory Machine Models for GPUs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[6] Franco Fummi,et al. A fine-grained performance model for GPU architectures , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[7] Bruce Merry,et al. A Performance Comparison of Sort and Scan Libraries for GPUs , 2015, Parallel Process. Lett..
[8] Stephan Olariu,et al. Weighted and Unweighted Selection Algorithms for k Sorted Sequences , 1997, ISAAC.
[9] Nodari Sitchinava,et al. Provably Efficient GPU Algorithms , 2013, ArXiv.
[10] Koji Nakano,et al. The Hierarchical Memory Machine Model for GPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[11] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[12] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[13] Shubhabrata Sengupta,et al. Efficient Parallel Scan Algorithms for GPUs , 2011 .
[14] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[15] Nodari Sitchinava,et al. Sorting and Permuting without Bank Conflicts on GPUs , 2015, ESA.
[16] Timothy J. Purcell. Sorting and searching , 2005, SIGGRAPH Courses.
[17] Lin Ma,et al. A Memory Access Model for Highly-threaded Many-core Architectures , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[18] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[19] Kunihiko Sadakane,et al. A Novel Computational Model for GPUs with Applications to Efficient Algorithms , 2015, Int. J. Netw. Comput..
[20] Naga K. Govindaraju,et al. Fast scan algorithms on graphics processors , 2008, ICS '08.
[21] Krzysztof Kaczmarski,et al. Experimental B+-tree for GPU , 2011, ADBIS.
[22] P. Sadayappan,et al. Characterizing and enhancing global memory data coalescing on GPUs , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[23] Vitaly Osipov,et al. GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[24] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[25] Yitzhak Birk,et al. Merge Path - A Visually Intuitive Approach to Parallel Merging , 2014, ArXiv.
[26] Pablo Enfedaque,et al. Implementation of the DWT in a GPU through a Register-based Strategy , 2015, IEEE Transactions on Parallel and Distributed Systems.
[27] Andrew S. Grimshaw,et al. Revisiting sorting for GPGPU stream architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Lin Ma,et al. Performance modeling for highly-threaded many-core GPUs , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.
[29] Andrew S. Grimshaw,et al. Parallel Scan for Stream Architectures , 2012 .
[30] Donald E. Knuth,et al. The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .
[31] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .
[32] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[33] Ben Karsin,et al. A Performance Model For Gpu Architectures: Analysis And Design Of Fundamental Algorithms , 2018 .
[34] P. J. Narayanan,et al. Discrete range searching primitive for the GPU and its applications , 2012, JEAL.
[35] David A. Bader,et al. GPU merge path: a GPU merging algorithm , 2012, ICS '12.
[36] Clifford Stein,et al. Introduction to Algorithms, 2nd edition. , 2001 .
[37] Mark de Berg,et al. Computational geometry: algorithms and applications , 1997 .
[38] Michael Garland,et al. A decomposition for in-place matrix transposition , 2014, PPoPP '14.