Towards a Systematic Analysis of Privacy Definitions

In statistical privacy, a privacy definition is regarded as a set of algorithms that are allowed to process sensitive data. It is often helpful to consider the complementary view that privacy definitions are also contracts that guide the behavior of algorithms that take in sensitive data and produce sanitized data. Historically, data privacy breaches have been the result of fundamental misunderstandings about what a particular privacy definition guarantees. Privacy definitions are often analyzed using a highly targeted approach: a specific attack strategy is evaluated to determine if a specific type of information can be inferred. If the attack works, one can conclude that the privacy definition is too weak. If it doesn't work, one often gains little information about its security (perhaps a slightly different attack would have worked?). Furthermore, these strategies will not identify cases where a privacy definition protects unnecessary pieces of information. On the other hand, technical results concerning generalizable and systematic analyses of privacy are few in number, but such results have significantly advanced our understanding of the design of privacy definitions. We add to this literature with a novel methodology for analyzing the Bayesian properties of a privacy definition. Its goal is to identify precisely the type of information being protected, hence making it easier to identify (and later remove) unnecessary data protections. Using privacy building blocks (which we refer to as axioms), we turn questions about semantics into mathematical problems -- the construction of a consistent normal form and the subsequent construction of the row cone (which is a geometric object that encapsulates Bayesian guarantees provided by a privacy definition). We apply these ideas to study randomized response, FRAPP/PRAM, and several algorithms that add integer-valued noise to their inputs; we show that their privacy properties can be stated in terms of the protection of various notions of parity of a dataset. Randomized response, in particular, provides unnecessarily strong protections for parity, and so we also show how our methodology can be used to relax privacy definitions.

[1]  Jerome P. Reiter Estimating Risks of Identification Disclosure in Microdata , 2005 .

[2]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[3]  Jerome P. Reiter,et al.  Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data , 2012, Trans. Data Priv..

[4]  Bing-Rong Lin,et al.  An Axiomatic View of Statistical Privacy and Utility , 2012, J. Priv. Confidentiality.

[5]  Cynthia Dwork,et al.  New Efficient Attacks on Statistical Disclosure Control Mechanisms , 2008, CRYPTO.

[6]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[7]  Raghav Bhaskar,et al.  Noiseless Database Privacy , 2011, ASIACRYPT.

[8]  Chengfang Fang,et al.  Information Leakage in Optimal Anonymized and Diversified Data , 2008, Information Hiding.

[9]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[10]  Daniel Kifer,et al.  Attacks on privacy and deFinetti's theorem , 2009, SIGMOD Conference.

[11]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Yufei Tao,et al.  Transparent anonymization: Thwarting adversaries who know the algorithm , 2010, TODS.

[13]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[14]  Walter Rudin,et al.  Real & Complex Analysis , 1987 .

[15]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[16]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[17]  Bing-Rong Lin,et al.  Reasoning about privacy using axioms , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[18]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[19]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[20]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[21]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[22]  Michele Boreale,et al.  Worst- and average-case privacy breaches in randomization mechanisms , 2012, Theor. Comput. Sci..

[23]  J. G. Skellam The frequency distribution of the difference between two Poisson variates belonging to different populations. , 1946, Journal of the Royal Statistical Society. Series A.

[24]  Ashwin Machanavajjhala,et al.  Privacy-Preserving Data Publishing , 2009, Found. Trends Databases.

[25]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[26]  Peter Kooiman,et al.  Post randomisation for statistical disclosure control: Theory and implementation , 1997 .

[27]  Kun Liu,et al.  A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods , 2008, Privacy-Preserving Data Mining.

[28]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[29]  Adam D. Smith,et al.  The price of privately releasing contingency tables and the spectra of random matrices with correlated rows , 2010, STOC '10.

[30]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[31]  Cynthia Dwork,et al.  The price of privacy and the limits of LP decoding , 2007, STOC '07.

[32]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[33]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[34]  Eric Schechter,et al.  Handbook of Analysis and Its Foundations , 1996 .

[35]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[36]  Vitaly Shmatikov,et al.  How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[37]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[38]  Bing-Rong Lin,et al.  Towards an axiomatization of statistical privacy and utility , 2010, PODS.

[39]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[40]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[41]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[42]  Kamalika Chaudhuri,et al.  When Random Sampling Preserves Privacy , 2006, CRYPTO.

[43]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.

[44]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[45]  Tal Malkin,et al.  The power of the dinur-nissim algorithm: breaking privacy of statistical and graph databases , 2012, PODS '12.

[46]  K. F. Siler [Blank page] , 2013, 2013 Symposium on VLSI Technology.

[47]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[48]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[49]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[50]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[51]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[52]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[53]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[54]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[55]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[56]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[57]  Luc Lauwers,et al.  Purely Finitely Additive Measures are Non-Constructible Objects , 2010 .

[58]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[59]  Jon M. Kleinberg,et al.  Wherefore art thou R3579X? , 2011, Commun. ACM.

[60]  Moni Naor,et al.  On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy , 2010, J. Priv. Confidentiality.

[61]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[62]  Johannes Gehrke,et al.  Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy , 2011, TCC.

[63]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[64]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[65]  Ninghui Li,et al.  Minimizing minimality and maximizing utility , 2010, Proc. VLDB Endow..

[66]  Ashwin Machanavajjhala,et al.  A rigorous and customizable framework for privacy , 2012, PODS.

[67]  Chris Clifton,et al.  Defining Privacy for Data Mining , 2002 .