Auto-parallelization of data structure operations for GPUs
暂无分享,去创建一个
[1] P J Narayanan,et al. Fast minimum spanning tree for large graphs on the GPU , 2009, High Performance Graphics.
[2] Keshav Pingali,et al. Morph algorithms on GPUs , 2013, PPoPP '13.
[3] Andrey N. Chernikov,et al. Effective out-of-core parallel Delaunay mesh refinement using off-the-shelf software , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[4] Eran Yahav,et al. Deriving linearizable fine-grained concurrent objects , 2008, PLDI '08.
[5] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[6] Matei Ripeanu,et al. A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] Keshav Pingali,et al. Atomic-free irregular computations on GPUs , 2013, GPGPU@ASPLOS.
[8] Donald Cohen,et al. Automating relational operations on data structures , 1993, IEEE Software.
[9] Keshav Pingali,et al. Optimistic parallelism requires abstractions , 2009, CACM.
[10] Armando Solar-Lezama,et al. Sketching concurrent data structures , 2008, PLDI '08.
[11] Alexander Aiken,et al. Concurrent data representation synthesis , 2012, PLDI.
[12] Thanh-Tung Cao,et al. Scalable parallel minimum spanning forest computation , 2012, PPoPP '12.
[13] Andrey N. Chernikov,et al. Fully Generalized Two-Dimensional Constrained Delaunay Mesh Refinement , 2010, SIAM J. Sci. Comput..
[14] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[15] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.
[16] Ondrej Lhoták,et al. Jedd: a BDD-based relational extension of Java , 2004, PLDI '04.
[17] Pascal Fradet,et al. Shape types , 1997, POPL '97.
[18] Kunle Olukotun,et al. Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.
[19] Alexander Aiken,et al. Data representation synthesis , 2011, PLDI '11.
[20] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[21] Rishabh Singh,et al. Synthesizing data structure manipulations from storyboards , 2011, ESEC/FSE '11.
[22] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[23] FerranteJeanne,et al. The program dependence graph and its use in optimization , 1987 .
[24] Andrey N. Chernikov,et al. Three-dimensional delaunay refinement for multi-core processors , 2008, ICS '08.
[25] Edmond Chow,et al. A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[26] David A. Bader,et al. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[27] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[28] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[29] Keshav Pingali,et al. A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.