Single Pass Entrywise-Transformed Low Rank Approximation

In applications such as natural language processing or computer vision, one is given a large n×d matrix A = (ai,j) and would like to compute a matrix decomposition, e.g., a low rank approximation, of a function f(A) = (f(ai,j)) applied entrywise to A. A very important special case is the likelihood function f (A) = log (|aij |+ 1). A natural way to do this would be to simply apply f to each entry of A, and then compute the matrix decomposition, but this requires storing all of A as well as multiple passes over its entries. Recent work of Liang et al. shows how to find a rank-k factorization to f(A) for an n× n matrix A using only n·poly( −1k log n) words of memory, with overall error 10‖f(A) − [f(A)]k‖F + poly( /k)‖f(A)‖1,2, where [f(A)]k is the best rank-k approximation to f(A) and ‖f(A)‖1,2 is the square of the sum of Euclidean lengths of rows of f(A). Their algorithm uses three passes over the entries of A. The authors pose the open question of obtaining an algorithm with n·poly( −1k log n) words of memory using only a single pass over the entries of A. In this paper we resolve this open question, obtaining the first single-pass algorithm for this problem and for the same class of functions f studied by Liang et al. Moreover, our error is ‖f(A)− [f(A)]k‖F + poly( /k)‖f(A)‖F , where ‖f(A)‖F is the sum of squares of Euclidean lengths of rows of f(A). Thus our error is significantly smaller, as it removes the factor of 10 and also ‖f(A)‖F ≤ ‖f(A)‖1,2. We also give an algorithm for regression, pointing out an error in previous work, and empirically validate our results. Tianjin University, China School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore Wuhan University of Technology, China Department of Computer Science, Carnegie Mellon University, USA. Correspondence to: Yi Li <yili@ntu.edu.sg>, David P. Woodruff <dwoodruf@andrew.cmu.edu>. Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

[1]  David P. Woodruff,et al.  Non-adaptive adaptive sampling on turnstile streams , 2020, STOC.

[2]  David P. Woodruff,et al.  Distributed low rank approximation of implicit functions of a matrix , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[3]  Jalaj Upadhyay,et al.  Differentially Private Linear Algebra in the Streaming Model , 2014, IACR Cryptol. ePrint Arch..

[4]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[5]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[6]  David P. Woodruff,et al.  Streaming Space Complexity of Nearly All Functions of One Variable on Frequency Vectors , 2016, PODS.

[7]  Yingyu Liang,et al.  Sketching Transformed Matrices with Applications to Natural Language Processing , 2020, AISTATS.

[8]  Alexandr Andoni,et al.  Streaming Algorithms via Precision Sampling , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[9]  Christos Boutsidis,et al.  Optimal principal component analysis in distributed and streaming models , 2015, STOC.

[10]  David P. Woodruff,et al.  Efficient Sketches for Earth-Mover Distance, with Applications , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[11]  David P. Woodruff,et al.  Exponentially Improved Dimensionality Reduction for 𝓁1: Subspace Embeddings and Independence Testing , 2021, COLT.

[12]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[13]  David P. Woodruff,et al.  Robust Subspace Approximation in a Stream , 2018, NeurIPS.

[14]  David P. Woodruff Low Rank Approximation Lower Bounds in Row-Update Streams , 2014, NIPS.

[15]  Ashwin Lall,et al.  Streaming Pointwise Mutual Information , 2009, NIPS.