Privacy in Statistical Databases

Interval protection or partial cell suppression was introduced in “M. Fischetti, J.-J. Salazar, Partial cell suppression: A new methodology for statistical disclosure control, Statistics and Computing, 13, 13–21, 2003” as a “linearization” of the difficult cell suppression problem. Interval protection replaces some cells by intervals containing the original cell value, unlike in cell suppression where the values are suppressed. Although the resulting optimization problem is still huge—as in cell suppression, it is linear, thus allowing the application of efficient procedures. In this work we present preliminary results with a prototype implementation of Benders decomposition for interval protection. Although the above seminal publication about partial cell suppression applied a similar methodology, our approach differs in two aspects: (i) the boundaries of the intervals are completely independent in our implementation, whereas the one of 2003 solved a simpler variant where boundaries must satisfy a certain ratio; (ii) our prototype is applied to a set of seven general and hierarchical tables, whereas only three two-dimensional tables were solved with the implementation of 2003.

[1]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[2]  Stephen E. Fienberg,et al.  On-Average KL-Privacy and Its Equivalence to Generalization for Max-Entropy Mechanisms , 2016, PSD.

[3]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[4]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[5]  Richard Conway,et al.  Selective partial access to a database , 1976, ACM '76.

[6]  David Sánchez,et al.  Semantic adaptive microaggregation of categorical microdata , 2012, Comput. Secur..

[7]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[8]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[9]  Darakhshan J. Mir Information-Theoretic Foundations of Differential Privacy , 2012, FPS.

[10]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[11]  Josep Domingo-Ferrer,et al.  Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections , 2016, Database Anonymization.

[12]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[13]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[14]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[15]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[16]  Stephen E. Fienberg,et al.  Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle , 2015, J. Mach. Learn. Res..

[17]  Thomas Steinke,et al.  Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).

[18]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[19]  David Sánchez,et al.  C‐sanitized: A privacy model for document redaction and sanitization , 2014, J. Assoc. Inf. Sci. Technol..

[20]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[21]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[22]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[23]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[24]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[25]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[26]  Guillermo Navarro-Arribas,et al.  On the Declassification of Confidential Documents , 2011, MDAI.

[27]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[28]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[29]  Montserrat Batet,et al.  Utility preserving query log anonymization via semantic microaggregation , 2013, Inf. Sci..

[30]  David Sánchez,et al.  Semantic Noise: Privacy-Protection of Nominal Microdata through Uncorrelated Noise Addition , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).