Analysis of classic algorithms on highly-threaded many-core architectures
暂无分享,去创建一个
Lin Ma | Roger D. Chamberlain | Chen Tian | Kunal Agrawal | Ziang Hu | Kunal Agrawal | Ziang Hu | Chen Tian | R. Chamberlain | Lin Ma
[1] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[2] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[3] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[4] Weiguo Liu,et al. Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.
[5] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[6] Alok Aggarwal,et al. Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[7] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[8] Lin Ma,et al. Analysis of classic algorithms on GPUs , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).
[9] Shahid H. Bokhari,et al. A comparison of the Cray XMT and XMT‐2 , 2013, Concurr. Comput. Pract. Exp..
[10] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[11] James Christopher Wyllie,et al. The Complexity of Parallel Computations , 1979 .
[12] Larry Carter,et al. Multi-processor Performance on the Tera MTA , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[13] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.
[14] Nuno Roma,et al. Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays , 2011, 2011 International Conference on High Performance Computing & Simulation.
[15] Koji Nakano,et al. The Hierarchical Memory Machine Model for GPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[16] Lin Ma,et al. A Memory Access Model for Highly-threaded Many-core Architectures , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[17] Bowen Alpern,et al. Visualizing computer memory architectures , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.
[18] Lin Ma,et al. Theoretical analysis of classic algorithms on highly-threaded many-core GPUs , 2014, PPoPP '14.
[19] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.
[20] Lin Ma,et al. Performance modeling for highly-threaded many-core GPUs , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.
[21] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[22] P J Narayanan,et al. Fast minimum spanning tree for large graphs on the GPU , 2009, High Performance Graphics.
[23] George C. Caragea,et al. Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform , 2006, Handbook of Parallel Computing.
[24] Michael A. Bender,et al. Concurrent cache-oblivious b-trees , 2005, SPAA '05.
[25] Vijaya Ramachandran,et al. Oblivious algorithms for multicores and network of processors , 2010, IPDPS.
[26] Marc Moreno Maza,et al. A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads , 2014, PARCO.
[27] Cynthia A. Phillips,et al. Two-Level Main Memory Co-Design: Multi-threaded Algorithmic Primitives, Analysis, and Simulation , 2015, IPDPS.
[28] Vijaya Ramachandran,et al. Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.
[29] Jeffrey Scott Vitter,et al. Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.
[30] P. J. Narayanan,et al. Some GPU Algorithms for Graph Connected Components and Spanning Tree , 2010, Parallel Process. Lett..
[31] Allan Porterfield,et al. The Tera computer system , 1990 .
[32] Lin Ma,et al. Bloom Filter Performance on Graphics Engines , 2011, 2011 International Conference on Parallel Processing.
[33] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[34] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[35] M. Lanzagorta,et al. Early Experience with Scientific Programs on the Cray MTA-2 , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[36] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .
[37] Bowen Alpern,et al. The uniform memory hierarchy model of computation , 2005, Algorithmica.
[38] Guy E. Blelloch,et al. Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.
[39] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[40] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.
[41] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[42] Jeffrey Scott Vitter,et al. Large-Scale Sorting in Uniform Memory Hierarchies , 1993, J. Parallel Distributed Comput..
[43] Michael T. Goodrich,et al. Fundamental parallel algorithms for private-cache chip multiprocessors , 2008, SPAA '08.
[44] Steven Fortune,et al. Parallelism in random access machines , 1978, STOC.
[45] Lin Ma,et al. A Performance Model for Memory Bandwidth Constrained Applications on Graphics Engines , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.
[46] Guy E. Blelloch,et al. Scheduling irregular parallel computations on hierarchical caches , 2011, SPAA '11.
[47] Ovidiu Daescu,et al. A Parallel Algorithm Development Model for the GPU Architecture , 2012 .