Block Randomized Singular Value Decomposition on GPUs

[1]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[2]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[3]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[4]  Per-Gunnar Martinsson,et al.  RSVDPACK: An implementation of randomized algorithms for computing the singular value, interpolative, and CUR decompositions of matrices on multi-core and GPU architectures , 2015 .

[5]  Jack J. Dongarra,et al.  Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Volkan Cevher,et al.  Practical Sketching Algorithms for Low-Rank Matrix Approximation , 2016, SIAM J. Matrix Anal. Appl..

[7]  Yaohang Li,et al.  Single-Pass PCA of Large High-Dimensional Data , 2017, IJCAI.

[8]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[9]  D Verhoeven,et al.  Limited-data computed tomography algorithms for the physical sciences. , 1993, Applied optics.

[10]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[11]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[12]  Kaveh Abhari,et al.  Computed Tomography image denoising utilizing an efficient sparse coding algorithm , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[13]  David I. August,et al.  Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.

[14]  Jack J. Dongarra,et al.  Out of memory SVD solver for big data , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  Hyeonjoon Moon,et al.  The FERET evaluation methodology for face-recognition algorithms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[17]  Jack J. Dongarra,et al.  Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[18]  James Demmel,et al.  Communication Avoiding Rank Revealing QR Factorization with Column Pivoting , 2015, SIAM J. Matrix Anal. Appl..

[19]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[20]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Jack J. Dongarra,et al.  High-performance Cholesky factorization for GPU-only execution , 2017, GPGPU@PPoPP.

[22]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[23]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[24]  Jack J. Dongarra,et al.  The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale , 2018, SIAM Rev..

[25]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[26]  James Demmel,et al.  Communication-Avoiding QR Decomposition for GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[27]  James Demmel,et al.  Communication-avoiding algorithms for linear algebra and beyond , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[28]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[29]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[30]  B. F. Logan,et al.  The Fourier reconstruction of a head section , 1974 .

[31]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[32]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[33]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[34]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..