Relative Errors for Deterministic Low-Rank Matrix Approximations

We consider processing an n x d matrix A in a stream with row-wise updates according to a recent algorithm called Frequent Directions (Liberty, KDD 2013). This algorithm maintains an l x d matrix Q deterministically, processing each row in O(d l^2) time; the processing time can be decreased to O(d l) with a slight modification in the algorithm and a constant increase in space. We show that if one sets l = k+ k/eps and returns Q_k, a k x d matrix that is the best rank k approximation to Q, then we achieve the following properties: ||A - A_k||_F^2 <= ||A||_F^2 - ||Q_k||_F^2 <= (1+eps) ||A - A_k||_F^2 and where pi_{Q_k}(A) is the projection of A onto the rowspace of Q_k then ||A - pi_{Q_k}(A)||_F^2 <= (1+eps) ||A - A_k||_F^2. We also show that Frequent Directions cannot be adapted to a sparse version in an obvious way that retains the l original rows of the matrix, as opposed to a linear combination or sketch of the rows.

[1]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[2]  Graham Cormode,et al.  The continuous distributed monitoring model , 2013, SGMD.

[3]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[4]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[5]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[6]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[7]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[8]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[9]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[10]  Piotr Indyk,et al.  Space-optimal heavy hitters with strong error bounds , 2010, TODS.

[11]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[12]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[13]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[14]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[15]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[16]  Michael Lindenbaum,et al.  Sequential Karhunen-Loeve basis extraction and its application to images , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[17]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[18]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[19]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[20]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[21]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[22]  Graham Cormode,et al.  Mergeable summaries , 2012, PODS '12.

[23]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[24]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[25]  Ralph R. Martin,et al.  Incremental Eigenanalysis for Classification , 1998, BMVC.

[26]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[27]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[28]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[29]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[30]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[31]  Christos Boutsidis,et al.  Near Optimal Column-Based Matrix Reconstruction , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.