What Can We Learn Privately?

Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.

[1]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[2]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[3]  Hans Ulrich Simon,et al.  On learning ring-sum-expansions , 1990, COLT '90.

[4]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[5]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[6]  A. Hout,et al.  Randomized Response, Statistical Disclosure Control and Misclassificatio: a Review , 2002 .

[7]  Larry A. Wasserman,et al.  Differential privacy with compression , 2009, 2009 IEEE International Symposium on Information Theory.

[8]  Rafael Pass,et al.  Bounded-concurrent secure multi-party computation with a dishonest majority , 2004, STOC '04.

[9]  Vitaly Feldman,et al.  On using extended statistical queries to avoid membership queries , 2002 .

[10]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[11]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[12]  Manuel Blum,et al.  Secure Human Identification Protocols , 2001, ASIACRYPT.

[13]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[14]  Markus Jakobsson,et al.  Cryptographic Randomized Response Techniques , 2003, Public Key Cryptography.

[15]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[16]  Ke Yang,et al.  New lower bounds for statistical query learning , 2002, J. Comput. Syst. Sci..

[17]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[18]  Cynthia Dwork,et al.  The price of privacy and the limits of LP decoding , 2007, STOC '07.

[19]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[20]  Moni Naor,et al.  Polling with Physical Envelopes: A Rigorous Analysis of a Human-Centric Protocol , 2006, EUROCRYPT.

[21]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[22]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[23]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[24]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[25]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[26]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[27]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[28]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.

[29]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[30]  Amos Beimel,et al.  Bounds on the Sample Complexity for Private Learning and Private Data Release , 2010, TCC.

[31]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[32]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[33]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[34]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[35]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[36]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[37]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[38]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[39]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[40]  Massimiliano Pontil,et al.  Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..

[41]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[42]  Shai Ben-David,et al.  Stability of k -Means Clustering , 2007, COLT.

[43]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[44]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[45]  Nina Mishra,et al.  Privacy via pseudorandom sketches , 2006, PODS.

[46]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[47]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[48]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[49]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[50]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[51]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[52]  M. Kearns,et al.  Algorithmic stability and sanity-check bounds for leave-one-out cross-validation , 1999 .

[53]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.