Contributions on Semantic Similarity and Its Applications to Data Privacy

Semantic similarity aims at quantifying the resemblance between the meaning of textual terms. Thus, it represents the corner stone of textual understanding. Given the increasing availability and importance of textual sources within the current context of Information Societies, a lot of attention has been put in recent years in the development of mechanisms to automatically measure semantic similarity and to apply them to tasks dealing with textual inputs (e.g. document classification, information retrieval, question answering, privacy-protection, etc.). This chapter offers describes and discusses recent findings and proposals published by the authors on semantic similarity. Moreover, it also details recent works applying semantic similarity to privacy protection of textual data.

[1]  David Sánchez,et al.  Automatic General-Purpose Sanitization of Textual Documents , 2013, IEEE Transactions on Information Forensics and Security.

[2]  A. Blank Words and Concepts in Time: towards Diachronic Cognitive Onomasiology , 2001 .

[3]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[4]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[5]  Asunción Gómez-Pérez,et al.  Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web , 2004, Advanced Information and Knowledge Processing.

[6]  Gerrit Antonides Evaluation and Applications , 1990 .

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[9]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[10]  David Sánchez,et al.  Semantically-grounded construction of centroids for datasets with textual attributes , 2012, Knowl. Based Syst..

[11]  Vicenç Torra,et al.  Towards a private vector space model for confidential documents , 2013, SAC '13.

[12]  Josep Domingo-Ferrer,et al.  Improving the Utility of Differentially Private Data Releases via k-Anonymity , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[13]  Hassan J. Eghbali,et al.  K-S Test for Detecting Changes from Landsat Imagery Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[15]  David Sánchez,et al.  Privacy protection of textual attributes through a semantic-based masking method , 2012, Inf. Fusion.

[16]  Josep Domingo-Ferrer,et al.  Enhancing data utility in differential privacy via microaggregation-based k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2014, The VLDB Journal.

[17]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[18]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[19]  Steffen Staab,et al.  The Karlsruhe view on ontologies , 2003 .

[20]  Danushka Bollegala,et al.  A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web , 2009, EMNLP.

[21]  David Sánchez,et al.  A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge , 2012, Int. J. Semantic Web Inf. Syst..

[22]  Irene M. Cramer,et al.  From Social Networks To Distributional Properties: A Comparative Study On Computing Semantic Relatedness , 2009 .

[23]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[24]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[25]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[26]  David Sánchez,et al.  Towards k-Anonymous Non-numerical Data via Semantic Resampling , 2012, IPMU.

[27]  David Sánchez,et al.  EVALUATION OF THE DISCLOSURE RISK OF MASKING METHODS DEALING WITH TEXTUAL ATTRIBUTES , 2012 .

[28]  David Sánchez,et al.  Semantic adaptive microaggregation of categorical microdata , 2012, Comput. Secur..

[29]  Hisham Al-Mubaid,et al.  Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Vicenç Torra Towards Knowledge Intensive Data Privacy , 2010, DPM/SETOP.

[31]  J. Katz,et al.  The philosophy of linguistics , 1989 .

[32]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[33]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[34]  David Sánchez,et al.  Ontology-driven web-based semantic similarity , 2010, Journal of Intelligent Information Systems.

[35]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[36]  David Sánchez,et al.  Enabling semantic similarity estimation across multiple ontologies: An evaluation in the biomedical domain , 2012, J. Biomed. Informatics.

[37]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[38]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[39]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[40]  Euripides G. M. Petrakis,et al.  X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies , 2006, J. Digit. Inf. Manag..

[41]  David Sánchez,et al.  Minimizing the disclosure risk of semantic correlations in document sanitization , 2013, Inf. Sci..

[42]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[43]  David Sánchez,et al.  A semantic framework to protect the privacy of electronic health records with non-numerical attributes , 2013, J. Biomed. Informatics.

[44]  A. Tversky Features of Similarity , 1977 .

[45]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[46]  Montserrat Batet,et al.  Semantic Anonymisation of Set-valued Data , 2014, ICAART.

[47]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[48]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[49]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[50]  Montserrat Batet,et al.  Ontology-based semantic clustering , 2011, AI Commun..

[51]  Josep Domingo-Ferrer,et al.  A Survey of Inference Control Methods for Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[52]  Daniel Abril,et al.  Document Sanitization: Measuring Search Engine Information Loss and Risk of Disclosure for the Wikileaks cables , 2012, Privacy in Statistical Databases.

[53]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[54]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[55]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[56]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[57]  Rafal A. Angryk,et al.  Measuring semantic similarity using wordnet-based context vectors , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[58]  Josep Domingo-Ferrer,et al.  Anonymization of nominal data based on semantic marginality , 2013, Inf. Sci..

[59]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control: Hundepool/Statistical Disclosure Control , 2012 .

[60]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[61]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[62]  Benoît Lemaire,et al.  Effects of High-Order Co-occurrences on Word Semantic Similarities , 2006, ArXiv.

[63]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[64]  David Sánchez,et al.  A semantic similarity method based on information content exploiting multiple ontologies , 2013, Expert Syst. Appl..

[65]  David Sánchez,et al.  Semantic similarity estimation from multiple ontologies , 2012, Applied Intelligence.

[66]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[67]  Montserrat Batet,et al.  Utility preserving query log anonymization via semantic microaggregation , 2013, Inf. Sci..

[68]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.