CPU versus GPU: which can perform matrix computation faster—performance comparison for basic linear algebra subprograms
暂无分享,去创建一个
Xiaofeng Zhang | Yunming Ye | Feng Li | Zhaoyang Tian | Yunming Ye | Feng Li | Zhaoyang Tian | Xiaofeng Zhang
[1] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[2] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[3] Darío Baptista,et al. A survey of software and hardware use in artificial neural networks , 2013, Neural Computing and Applications.
[4] Daisuke Takahashi,et al. Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[5] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[6] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.
[7] Jack J. Dongarra,et al. An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..
[8] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[9] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[10] Keechul Jung,et al. GPU implementation of neural networks , 2004, Pattern Recognit..
[11] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[12] Feng Liu,et al. Joint Weighted Nonnegative Matrix Factorization for Mining Attributed Graphs , 2017, PAKDD.
[13] Alex Graves,et al. Associative Long Short-Term Memory , 2016, ICML.
[14] Yunming Ye,et al. Multidimensional Latent Semantic Analysis Using Term Spatial Information , 2013, IEEE Transactions on Cybernetics.
[15] Rafael Mayo,et al. Evaluation and tuning of the Level 3 CUBLAS for graphics processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[16] Tommy W. S. Chow,et al. Object-Level Video Advertising: An Optimization Framework , 2017, IEEE Transactions on Industrial Informatics.
[17] Yunming Ye,et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.
[18] Martin Lilleeng Sætra,et al. Graphics processing unit (GPU) programming strategies and trends in GPU computing , 2013, J. Parallel Distributed Comput..
[19] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[20] Haijun Zhang,et al. Understanding Subtitles by Character-Level Sequence-to-Sequence Learning , 2017, IEEE Transactions on Industrial Informatics.
[21] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[22] Bernard Ghanem,et al. Representation learning with deep extreme learning machines for efficient image set classification , 2016, Neural Computing and Applications.
[23] Darío Baptista,et al. A survey of artificial neural network training tools , 2013, Neural Computing and Applications.
[24] David J. Evans,et al. The Parallel Solution of Triangular Systems of Equations , 1983, IEEE Transactions on Computers.
[25] Naohito Nakasato,et al. A fast GEMM implementation on the cypress GPU , 2011, PERV.
[26] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.