A WordNet-based semantic similarity measurement combining edge-counting and information content theory

Abstract Semantic similarity measuring between words can be applied to many applications, such as Artificial Intelligence, Information Processing, Medical Care and Linguistics. In this paper, we present a new approach for semantic similarity measuring which is based on edge-counting and information content theory. Specifically, the proposed measure nonlinearly transforms the weighted shortest path length between the compared concepts to achieve the semantic similarity results, and the relation between parameters and the correlation value is discussed in detail. Experimental results show that the proposed approach not only achieves high correlation value against human ratings but also has better distribution characteristics of the correlation coefficient compared with several related works in the literature. In addition, the proposed method is computationally efficient due to the simplified ways of weighting the shortest path length between the concept pairs.

[1]  Ted Briscoe,et al.  32nd Annual Meeting of the Association for Computational Linguistics, 27-30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings , 1994, ACL.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[5]  A. Tversky Features of Similarity , 1977 .

[6]  George Hripcsak,et al.  Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases , 2008, J. Biomed. Informatics.

[7]  Hai-Tao Zheng,et al.  An ontology-based approach to Chinese semantic advertising , 2012, Inf. Sci..

[8]  Bernard Kamsu-Foguem,et al.  Knowledge reuse integrating the collaboration from experts in industrial maintenance management , 2013, Knowl. Based Syst..

[9]  David Sánchez,et al.  Towards the estimation of feature-based semantic similarity using multiple ontologies , 2014, Knowl. Based Syst..

[10]  Yuh-Min Chen,et al.  A semantic-based approach to content abstraction and annotation for content management , 2009, Expert Syst. Appl..

[11]  Juan Llorens Morillo,et al.  Towards an ontology-based retrieval of UML Class Diagrams , 2012, Inf. Softw. Technol..

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Qi Hu,et al.  Supervised word sense disambiguation using semantic diffusion kernel , 2014, Eng. Appl. Artif. Intell..

[14]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[15]  Iraklis Varlamis,et al.  Semantic smoothing for text clustering , 2013, Knowl. Based Syst..

[16]  Alexander Budanitsky,et al.  Lexical Semantic Relatedness and Its Application in Natural Language Processing , 1999 .

[17]  Hisao Ishibuchi,et al.  Special Issue on "Evolutionary Fuzzy Systems" EFSs , 2013, Knowl. Based Syst..

[18]  Weiming Shen,et al.  An weighted ontology-based semantic similarity algorithm for web service , 2009, Expert Syst. Appl..

[19]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[20]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[21]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[22]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[23]  Montserrat Batet,et al.  Utility preserving query log anonymization via semantic microaggregation , 2013, Inf. Sci..

[24]  Vicent J. Botti,et al.  An execution time planner for the ARTIS agent architecture , 2008, Eng. Appl. Artif. Intell..

[25]  Bernard Kamsu-Foguem,et al.  Analysis reuse exploiting taxonomical information and belief assignment in industrial problem solving , 2013, Comput. Ind..

[26]  Hui Li,et al.  Majority voting combination of multiple case-based reasoning for financial distress prediction , 2009, Expert Syst. Appl..

[27]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[28]  Ngoc Thanh Nguyen,et al.  Semantic similarity measures for enhancing information retrieval in folksonomies , 2013, Expert Syst. Appl..

[29]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[30]  Xia Wang,et al.  Decision support in e-business based on assessing similarities between ontologies , 2012, Knowl. Based Syst..

[31]  Euripides G. M. Petrakis,et al.  X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies , 2006, J. Digit. Inf. Manag..

[32]  David Sánchez,et al.  Minimizing the disclosure risk of semantic correlations in document sanitization , 2013, Inf. Sci..

[33]  John Murphy,et al.  Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words , 1994 .

[34]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[35]  Zhenfei Zhang,et al.  AGFSM: An new FSM based on adapted Gaussian membership in case retrieval model for customer-driven design , 2011, Expert Syst. Appl..

[36]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[37]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[38]  Vicenç Torra,et al.  On the protection of social networks user's information , 2013, Knowl. Based Syst..

[39]  David Sánchez,et al.  Automatic extraction of acronym definitions from the Web , 2011, Applied Intelligence.

[40]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[41]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[42]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[43]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[44]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[45]  Michael K. Ng,et al.  Knowledge-based vector space model for text clustering , 2010, Knowledge and Information Systems.