Perturbative Data Protection of Multivariate Nominal Datasets

Many of the potentially sensitive personal data produced and compiled in electronic sources are nominal and multi-attribute (e.g., personal interests, healthcare diagnoses, commercial transactions, etc.). For such data, which are discrete, finite and non-ordinal, privacy-protection methods should mask original values to prevent disclosure while preserving the underlying semantics of nominal attributes and the (potential) correlation between them. In this paper we tackle this challenge by proposing a semantically-grounded version of numerical correlated noise addition that, by relying on structured knowledge sources (ontologies), is capable of perturbing/masking multivariate nominal attributes while reasonably preserving their semantics and correlations.

[1]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[2]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Richard Conway,et al.  Selective partial access to a database , 1976, ACM '76.

[5]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[6]  David Sánchez,et al.  C‐sanitized: A privacy model for document redaction and sanitization , 2014, J. Assoc. Inf. Sci. Technol..

[7]  Josep Domingo-Ferrer,et al.  Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections , 2016, Database Anonymization.

[8]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[9]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[10]  Kent A Spackman,et al.  SNOMED CT milestones: endorsements are added to already-impressive standards credentials. , 2004, Healthcare informatics : the business magazine for information and communication systems.

[11]  Guillermo Navarro-Arribas,et al.  On the Declassification of Confidential Documents , 2011, MDAI.

[12]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[13]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[14]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  David Sánchez,et al.  Semantic adaptive microaggregation of categorical microdata , 2012, Comput. Secur..

[16]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[17]  Montserrat Batet,et al.  Utility preserving query log anonymization via semantic microaggregation , 2013, Inf. Sci..

[18]  David Sánchez,et al.  Semantic Noise: Privacy-Protection of Nominal Microdata through Uncorrelated Noise Addition , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).