Synchronization Trade-Offs in GPU Implementations of Graph Algorithms
暂无分享,去创建一个
[1] P J Narayanan,et al. Fast minimum spanning tree for large graphs on the GPU , 2009, High Performance Graphics.
[2] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.
[3] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[4] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.
[5] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[6] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.
[7] Michael Garland,et al. Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[8] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[9] Matthew Might,et al. EigenCFA: accelerating flow analysis with GPUs , 2011, POPL '11.
[10] Keshav Pingali,et al. Stochastic gradient descent on GPUs , 2015, GPGPU@PPoPP.
[11] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[12] Keshav Pingali,et al. A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.
[13] Thanh-Tung Cao,et al. Scalable parallel minimum spanning forest computation , 2012, PPoPP '12.
[14] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[15] Eddy Z. Zhang,et al. Massive atomics for massive parallelism on GPUs , 2014, ISMM '14.
[16] Stefan Edelkamp,et al. Stochastic Gradient Descent with GPGPU , 2012, KI.
[17] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.
[18] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.
[19] Wu-chun Feng,et al. Performance Characterization and Optimization of Atomic Operations on AMD GPUs , 2011, 2011 IEEE International Conference on Cluster Computing.
[20] Ruppa K. Thulasiram,et al. Exploiting Parallelism in Iterative Irregular Maxflow Computations on GPU Accelerators , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).
[21] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[22] P. Sadayappan,et al. Stencil-Aware GPU Optimization of Iterative Solvers , 2013, SIAM J. Sci. Comput..
[23] Michael Garland,et al. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[24] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.
[25] Patrice Y. Simard,et al. Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[26] Thomas Paine,et al. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training , 2013, ICLR.
[27] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.