Semantic Noise: Privacy-Protection of Nominal Microdata through Uncorrelated Noise Addition

Personal data are of great interest in statistical studies and to provide personalized services, but its release may impair the privacy of individuals. To protect the privacy, in this paper, we present the notion and practical enforcement of semantic noise, a semantically-grounded version of the numerical uncorrelated noise addition method, which is capable of masking textual data while properly preserving their semantics. Unlike other perturbative masking schemes, our method can work with both datasets containing information of several individuals and single data. Empirical results show that our proposal provides semantically-coherent outcomes preserving data utility better than non-semantic perturbative mechanisms.

[1]  David Sánchez,et al.  Profiling social networks to provide useful and privacy‐preserving web search , 2014, J. Assoc. Inf. Sci. Technol..

[2]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[3]  Guillermo Navarro-Arribas,et al.  On the Declassification of Confidential Documents , 2011, MDAI.

[4]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[5]  Nora Cuppens-Boulahia,et al.  Data Privacy Management and Autonomous Spontaneous Security , 2014, Lecture Notes in Computer Science.

[6]  David Sánchez,et al.  A Review on Semantic Similarity , 2015 .

[7]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[8]  David Sánchez,et al.  Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[9]  Nancy L. Spruill PROTECTING CONFIDENTIALITY OF BUSINESS MICRODATA BY MASKING , 1984 .

[10]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[11]  Arno Scharl,et al.  Discovery and evaluation of non-taxonomic relations in domain ontologies , 2009, Int. J. Metadata Semant. Ontologies.

[12]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[13]  Sabrina De Capitani di Vimercati,et al.  Microdata Protection , 2007, Encyclopedia of Cryptography and Security.

[14]  Christopher C. Yang,et al.  Mining related queries from Web search engine query logs using an improved association rule mining model , 2007, J. Assoc. Inf. Sci. Technol..

[15]  Rathindra Sarathy,et al.  Security of random data perturbation methods , 1999, TODS.

[16]  David Sánchez,et al.  Semantic adaptive microaggregation of categorical microdata , 2012, Comput. Secur..

[17]  Vicenç Torra Towards Knowledge Intensive Data Privacy , 2010, DPM/SETOP.

[18]  Richard Conway,et al.  Selective partial access to a database , 1976, ACM '76.

[19]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[20]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[21]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[22]  David Sánchez,et al.  A Semantic Approach for Ontology Evaluation , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[23]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[24]  David Sánchez,et al.  Semantically-grounded construction of centroids for datasets with textual attributes , 2012, Knowl. Based Syst..

[25]  Microdata Statistical Disclosure Control , 2011, Encyclopedia of Cryptography and Security.

[26]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[27]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[28]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[29]  David Sánchez,et al.  Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines , 2013, Inf. Sci..

[30]  Josep Domingo-Ferrer,et al.  Anonymization of nominal data based on semantic marginality , 2013, Inf. Sci..

[31]  Montserrat Batet,et al.  Semantic Anonymisation of Set-valued Data , 2014, ICAART.

[32]  Sofia Stamou,et al.  Web query disambiguation using PageRank , 2012, J. Assoc. Inf. Sci. Technol..

[33]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[34]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .