Analyze gauss: optimal bounds for privacy-preserving principal component analysis

We consider the problem of privately releasing a low dimensional approximation to a set of data records, represented as a matrix A in which each row corresponds to an individual and each column to an attribute. Our goal is to compute a subspace that captures the covariance of A as much as possible, classically known as principal component analysis (PCA). We assume that each row of A has ℓ2 norm bounded by one, and the privacy guarantee is defined with respect to addition or removal of any single row. We show that the well-known, but misnamed, randomized response algorithm, with properly tuned parameters, provides nearly optimal additive quality gap compared to the best possible singular subspace of A. We further show that when ATA has a large eigenvalue gap -- a reason often cited for PCA -- the quality improves significantly. Optimality (up to logarithmic factors) is proved using techniques inspired by the recent work of Bun, Ullman, and Vadhan on applying Tardos's fingerprinting codes to the construction of hard instances for private mechanisms for 1-way marginal queries. Along the way we define a list culling game which may be of independent interest. By combining the randomized response mechanism with the well-known following the perturbed leader algorithm of Kalai and Vempala we obtain a private online algorithm with nearly optimal regret. The regret of our algorithm even outperforms all the previously known online non-private algorithms of this type. We achieve this better bound by, satisfyingly, borrowing insights and tools from differential privacy!

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[3]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[4]  Jon Louis Bentley,et al.  Decomposable Searching Problems I: Static-to-Dynamic Transformation , 1980, J. Algorithms.

[5]  V. Rich Personal communication , 1989, Nature.

[6]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[7]  R. Bhatia Matrix Analysis , 1996 .

[8]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[9]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[10]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[11]  Ren-Cang Li,et al.  Relative Perturbation Theory: II. Eigenspace and Singular Subspace Variations , 1996, SIAM J. Matrix Anal. Appl..

[12]  Dan Boneh,et al.  Collusion-Secure Fingerprinting for Digital Data , 1998, IEEE Trans. Inf. Theory.

[13]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[14]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[15]  Frank McSherry,et al.  Fast computation of low rank matrix. , 2001, STOC 2001.

[16]  Dan Collusion-Secure Fingerprinting for Digital Data , 2002 .

[17]  Gábor Tardos,et al.  Optimal probabilistic fingerprint codes , 2003, STOC '03.

[18]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[19]  Anna R. Karlin,et al.  Spectral methods for data analysis , 2004 .

[20]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[21]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[22]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[23]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[24]  Adam D. Smith,et al.  A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information , 2008, IACR Cryptol. ePrint Arch..

[25]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[26]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[27]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[28]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[29]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[31]  Ameet Talwalkar,et al.  Matrix Coherence and the Nystrom Method , 2010, UAI.

[32]  Manfred K. Warmuth,et al.  Corrigendum to "Learning rotations with little regret" September 7, 2010 , 2010 .

[33]  Manfred K. Warmuth,et al.  On-line Variance Minimization in O(n2) per Trial? , 2010, COLT.

[34]  Michael Elad,et al.  Corrigendum: Example-Based Regularization Deployed to Super-Resolution Reconstruction of a Single Image , 2010, Computer/law journal.

[35]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, ICALP.

[36]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[37]  Ameet Talwalkar,et al.  Can matrix coherence be efficiently and accurately estimated? , 2011, AISTATS.

[38]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[39]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[40]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[41]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[42]  Aaron Roth,et al.  Beating randomized response on incoherent matrices , 2011, STOC '12.

[43]  Anand D. Sarwate,et al.  Near-optimal Differentially Private Principal Components , 2012, NIPS.

[44]  T. Tao Topics in Random Matrix Theory , 2012 .

[45]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[46]  Jiazhong Nie,et al.  Online PCA with Optimal Regrets , 2013, ALT.

[47]  Adam D. Smith,et al.  (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.

[48]  Kunal Talwar,et al.  On differentially private low rank approximation , 2013, SODA.

[49]  Aaron Roth,et al.  Beyond worst-case analysis in private singular vector computation , 2012, STOC '13.

[50]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[51]  Jonathan Ullman,et al.  Fingerprinting codes and the price of approximate differential privacy , 2013, STOC.

[52]  Manfred K. Warmuth,et al.  Learning rotations with little regret , 2010, Machine Learning.