Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD
暂无分享,去创建一个
Yasuyuki Matsushita | Stanimire Tomov | Fumihiko Ino | Ichitaro Yamazaki | Jack Dongarra | Yuechao Lu | J. Dongarra | Y. Matsushita | I. Yamazaki | S. Tomov | Yuechao Lu | Fumihiko Ino
[1] Volkan Cevher,et al. Practical Sketching Algorithms for Low-Rank Matrix Approximation , 2016, SIAM J. Matrix Anal. Appl..
[2] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[3] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[4] Kesheng Wu,et al. A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..
[5] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[6] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[7] Jack J. Dongarra,et al. The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale , 2018, SIAM Rev..
[8] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[9] Yi Ma,et al. Robust principal component analysis? , 2009, JACM.
[10] V. Kshirsagar,et al. Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.
[11] Yaohang Li,et al. GPU Accelerated Randomized Singular Value Decomposition and Its Application in Image Compression , 2015 .
[12] Eduardo F. D'Azevedo,et al. Parallel LU Factorization on GPU Cluster , 2012, ICCS.
[13] James Demmel,et al. Communication-Avoiding QR Decomposition for GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[14] Jack J. Dongarra,et al. Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] V. Rokhlin,et al. A randomized algorithm for the approximation of matrices , 2006 .
[16] Stanimire Tomov,et al. A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations , 2018, IEEE Transactions on Parallel and Distributed Systems.
[17] C. Frankenberg,et al. Prospects for Chlorophyll Fluorescence Remote Sensing from the Orbiting Carbon Observatory-2 , 2014 .
[18] Junzhou Huang,et al. Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.
[19] Xiaoming Yuan,et al. Sparse and low-rank matrix decomposition via alternating direction method , 2013 .
[20] Mark Hoemmen,et al. Communication-avoiding Krylov subspace methods , 2010 .
[21] Arvind Ganesh,et al. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix , 2009 .
[22] Per-Gunnar Martinsson,et al. Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.
[23] Jack J. Dongarra,et al. Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] Jack J. Dongarra,et al. Non‐GPU‐resident symmetric indefinite factorization , 2017, Concurr. Comput. Pract. Exp..
[25] Michael J. Black,et al. A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.
[26] Yi Yang,et al. BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing , 2015, ICS.
[27] Michael W. Mahoney,et al. A randomized algorithm for a tensor-based generalization of the singular value decomposition , 2007 .
[28] Tamás Sarlós,et al. Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[29] Yaohang Li,et al. Single-Pass PCA of Large High-Dimensional Data , 2017, IJCAI.
[30] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..
[31] A. Hoecker,et al. SVD APPROACH TO DATA UNFOLDING , 1995, hep-ph/9509307.
[32] N. Benjamin Erichson,et al. Randomized low-rank Dynamic Mode Decomposition for motion detection , 2015, Comput. Vis. Image Underst..
[33] Hyeonjoon Moon,et al. The FERET evaluation methodology for face-recognition algorithms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[34] Jack J. Dongarra,et al. Out of memory SVD solver for big data , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[35] E. Henry,et al. [8] Singular value decomposition: Application to analysis of experimental data , 1992 .
[36] Henryk Wozniakowski,et al. Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..
[37] Fumihiko Ino,et al. GPU‐based branch‐and‐bound method to solve large 0‐1 knapsack problems with data‐centric strategies , 2018, Concurr. Comput. Pract. Exp..
[38] Luis Mateus Rocha,et al. Singular value decomposition and principal component analysis , 2003 .
[39] James Demmel,et al. Communication-avoiding algorithms for linear algebra and beyond , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[40] Michael W. Mahoney. Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..
[41] B. S. Garbow,et al. Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.
[42] R. Larsen. Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .
[43] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.
[44] Per-Gunnar Martinsson,et al. RSVDPACK: An implementation of randomized algorithms for computing the singular value, interpolative, and CUR decompositions of matrices on multi-core and GPU architectures , 2015 .
[45] Baoxin Li,et al. Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[46] J. Kuczy,et al. Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .
[47] Dingwen Tao,et al. TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs , 2019, ICS.
[48] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[49] Yi Ma,et al. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.
[50] Petros Drineas,et al. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..
[51] Yasuyuki Matsushita,et al. Fast randomized Singular Value Thresholding for Nuclear Norm Minimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] H. Andrews,et al. Singular Value Decomposition (SVD) Image Coding , 1976, IEEE Trans. Commun..
[53] Stanimire Tomov,et al. One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators , 2012, ICCS.
[54] Nathaniel E. Helwig,et al. An Introduction to Linear Algebra , 2006 .
[55] Mark Tygert,et al. A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..
[56] Russ B. Altman,et al. Missing value estimation methods for DNA microarrays , 2001, Bioinform..
[57] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[58] Gene H. Golub,et al. Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.
[59] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[60] Jack J. Dongarra,et al. Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs , 2017, Parallel Comput..
[61] Jack Dongarra,et al. Random Sampling to Update Partial Singular Value Decomposition on a Hybrid CPU / GPU Cluster , 2015 .