On Low-Space Differentially Private Low-rank Factorization in the Spectral Norm

Low-rank factorization is used in many areas of computer science where one performs spectral analysis on large sensitive data stored in the form of matrices. In this paper, we study differentially private low-rank factorization of a matrix with respect to the spectral norm in the turnstile update model. In this problem, given an input matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ updated in the turnstile manner and a target rank $k$, the goal is to find two rank-$k$ orthogonal matrices $\mathbf{U}_k \in \mathbb{R}^{m \times k}$ and $\mathbf{V}_k \in \mathbb{R}^{n \times k}$, and one positive semidefinite diagonal matrix $\textbf{\Sigma}_k \in \mathbb{R}^{k \times k}$ such that $\mathbf{A} \approx \mathbf{U}_k \textbf{\Sigma}_k \mathbf{V}_k^\mathsf{T}$ with respect to the spectral norm. Our main contributions are two computationally efficient and sub-linear space algorithms for computing a differentially private low-rank factorization. We consider two levels of privacy. In the first level of privacy, we consider two matrices neighboring if their difference has a Frobenius norm at most $1$. In the second level of privacy, we consider two matrices as neighboring if their difference can be represented as an outer product of two unit vectors. Both these privacy levels are stronger than those studied in the earlier papers such as Dwork {\it et al.} (STOC 2014), Hardt and Roth (STOC 2013), and Hardt and Price (NIPS 2014). As a corollary to our results, we get non-private algorithms that compute low-rank factorization in the turnstile update model with respect to the spectral norm. We note that, prior to this work, no algorithm that outputs low-rank factorization with respect to the spectral norm in the turnstile update model was known; i.e., our algorithm gives the first non-private low-rank factorization with respect to the spectral norm in the turnstile update mode.

[1]  Jalaj Upadhyay,et al.  Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy and Compressed Sensing , 2014, 1410.2470.

[2]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[3]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[4]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[5]  Trac D. Tran,et al.  A fast and efficient algorithm for low-rank approximation of a matrix , 2009, STOC '09.

[6]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[7]  Zohar S. Karnin,et al.  Online {PCA} with Spectral Bounds , 2015 .

[8]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[9]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[10]  A. Rantzer,et al.  On a generalized matrix approximation problem in the spectral norm , 2012 .

[11]  Kunal Talwar,et al.  On differentially private low rank approximation , 2013, SODA.

[12]  Zhihua Zhang,et al.  Wishart Mechanism for Differentially Private Principal Components Analysis , 2015, AAAI.

[13]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[14]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[15]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[16]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[17]  Prabhakar Raghavan,et al.  Competitive recommendation systems , 2002, STOC '02.

[18]  Aaron Roth,et al.  Beyond worst-case analysis in private singular vector computation , 2012, STOC '13.

[19]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[20]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[21]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[22]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[23]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[24]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[25]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[26]  Jalaj Upadhyay,et al.  Differentially Private Linear Algebra in the Streaming Model , 2014, IACR Cryptol. ePrint Arch..

[27]  Michael B. Cohen,et al.  Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[28]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[29]  Jalaj Upadhyay,et al.  Random Projections, Graph Sparsification, and Differential Privacy , 2013, ASIACRYPT.

[30]  Christos Boutsidis,et al.  Optimal principal component analysis in distributed and streaming models , 2015, STOC.

[31]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[32]  Jalaj Upadhyay,et al.  Circulant Matrices and Differential Privacy , 2014, IACR Cryptol. ePrint Arch..

[33]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[34]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[35]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[36]  Aaron Roth,et al.  Beating randomized response on incoherent matrices , 2011, STOC '12.

[37]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[38]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[39]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[40]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[41]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[42]  Avner Magen,et al.  Low rank matrix-valued chernoff bounds and approximate matrix multiplication , 2010, SODA '11.

[43]  Avrim Blum,et al.  The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[44]  Jalaj Upadhyay,et al.  The Price of Differential Privacy for Low-Rank Factorization , 2016 .

[45]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[46]  M. Rudelson,et al.  Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[47]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[48]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[49]  Li Zhang,et al.  Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[50]  Anand D. Sarwate,et al.  Near-optimal Differentially Private Principal Components , 2012, NIPS.

[51]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[52]  Adam D. Smith,et al.  (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.

[53]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[54]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.