Using a Search Engine-Based Mutually Reinforcing Approach to Assess the Semantic Relatedness of Biomedical Terms

Background Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. Methodology/Principal Findings In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. Conclusions/Significance The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods.

[1]  Narendra Pradhan,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2014 .

[2]  Danushka Bollegala,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.

[3]  Sheau-Ling Hsieh,et al.  Semantic similarity measure in biomedical domain leverage Web Search Engine , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[4]  Hisham Al-Mubaid,et al.  Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Nigel Collier,et al.  Synonym set extraction from the biomedical literature by lexical pattern discovery , 2007, BMC Bioinformatics.

[6]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[7]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[9]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[10]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[11]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[12]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[13]  Angelos Hliaoutakis,et al.  Semantic Similarity Measures in MeSH Ontology and their application to Information Retrieval on Medline , 2005 .

[14]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[15]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[16]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[19]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[20]  Dekang Lin,et al.  Review of WordNet: an electronic lexical database by Christiane Fellbaum. The MIT Press 1998. , 1999 .

[21]  Pete Whitelock,et al.  Proceedings of the 17th international conference on Computational linguistics - Volume 2 , 1998 .

[22]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[23]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[24]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[25]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[26]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[27]  Y Yang,et al.  An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts , 1996, Comput. Biol. Medicine.

[28]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[29]  Ted Briscoe,et al.  32nd Annual Meeting of the Association for Computational Linguistics, 27-30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings , 1994, ACL.

[30]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[31]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[32]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..