Sensitive Disclosures under Differential Privacy Guarantees

Non-independent reasoning (NIR) refers to learning the information of one record from other records, under the assumption that these records share the same underlying distribution. Accurate NIR could disclose private information of an individual. An important assumption made by differential privacy is that NIR is considered to be non-violation of privacy. In this work, we investigate the extent to which private information of an individual may be disclosed through NIR by query answers that satisfy differential privacy. We first define what a disclosure means under NIR by randomized query answers. We then present a formal analysis on such disclosures by differentially private query answers. Our analysis on real life datasets demonstrates that while disclosures of NIR can be eliminated by adopting a more restricted setting of differential privacy, such settings adversely affects the utility of query answers for data analysis, and this conflict can not be easily solved because both disclosures and utility depend on the accuracy of noisy query answers. This study suggests that under the assumption that the disclosure through NIR is a privacy concern, differential privacy is not suitable because it does not provide both privacy and utility.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[3]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[4]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  Samuel Kotz,et al.  The stress-strength model and its generalizations , 2013 .

[9]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[10]  Charles Duhigg,et al.  How Companies Learn Your Secrets , 2012 .

[11]  Aryya Gangopadhyay,et al.  A Privacy Protection Model for Patient Data with Multiple Sensitive Attributes , 2008, Int. J. Inf. Secur. Priv..

[12]  Chao Li,et al.  Optimizing linear queries under differential privacy , 2013 .

[13]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[14]  Benjamin C. M. Fung,et al.  Privacy-preserving data publishing , 2007 .

[15]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[16]  Ashwin Machanavajjhala,et al.  Privacy-Preserving Data Publishing , 2009, Found. Trends Databases.

[17]  Thomas Lumley,et al.  Kendall's advanced theory of statistics. Volume 2A: classical inference and the linear model. Alan Stuart, Keith Ord and Steven Arnold, Arnold, London, 1998, No. of pages: xiv+885. Price: £85.00. ISBN 0‐340‐66230‐1 , 2000 .

[18]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[19]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[21]  Minghua Chen,et al.  Optimal Random Perturbation at Multiple Privacy Levels , 2009, Proc. VLDB Endow..

[22]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[23]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[24]  Omer Tene Jules Polonetsky,et al.  Privacy in the Age of Big Data: A Time for Big Decisions , 2012 .

[25]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.

[26]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[27]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[28]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[31]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[32]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[33]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[34]  P. R. Nelson The algebra of random variables , 1979 .

[35]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[36]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[37]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[38]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[39]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[40]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[41]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[42]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[43]  Samuel Kotz,et al.  A note on the ratio of normal and Laplace random variables , 2006, Stat. Methods Appl..

[44]  Johannes Gehrke,et al.  iReduct: differential privacy with reduced relative errors , 2011, SIGMOD '11.

[45]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[46]  Ke Wang,et al.  Small domain randomization , 2010, Proc. VLDB Endow..

[47]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[48]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[49]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[50]  Jerome P. Reiter,et al.  Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data , 2012, Trans. Data Priv..

[51]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[52]  Yue Wang,et al.  A Data- and Workload-Aware Query Answering Algorithm for Range Queries Under Differential Privacy , 2014, Proc. VLDB Endow..

[53]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[54]  Hiroshi Nakagawa,et al.  Bayesian Differential Privacy on Correlated Data , 2015, SIGMOD Conference.

[55]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[56]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[57]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[58]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[59]  R. Lathe Phd by thesis , 1988, Nature.

[60]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[61]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[62]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[63]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[64]  N. L. Johnson,et al.  Survival Models and Data Analysis , 1982 .

[65]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[66]  Haim Kaplan,et al.  Private coresets , 2009, STOC '09.

[67]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[68]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[69]  Ashwin Machanavajjhala,et al.  Personalized Social Recommendations - Accurate or Private? , 2011, Proc. VLDB Endow..

[70]  Yufei Tao,et al.  On Anti-Corruption Privacy Preserving Publication , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[71]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[72]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[73]  Philip S. Yu,et al.  Reconstruction Privacy: Enabling Statistical Learning , 2015, EDBT.

[74]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[75]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[76]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[77]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[78]  Graham Cormode,et al.  Personal privacy vs population privacy: learning to attack anonymization , 2011, KDD.

[79]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[80]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[81]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[82]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[83]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[84]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[85]  Li Xiong,et al.  DPCube: Releasing Differentially Private Data Cubes for Health Information , 2012, 2012 IEEE 28th International Conference on Data Engineering.