A Hybrid Semantic Relatedness Algorithm by Entity Co-Occurrence and Specialized Word Embeddings

As the amount of biomedical literature increases, measuring semantic relatedness between two entities is one of the important tasks from the perspective of finding meaningful biological relationships. In this paper, we propose a hybrid semantic relatedness algorithm for biomedical knowledge discovery. We incorporate a co-occurrence approach to capture hidden relationship and adopt specialized word embeddings by considering both direct and indirect entity relationships. We analyze and evaluate our proposed method with other well-accepted methods such as co-occurrence, Word2Vec, COALS, and random indexing by comparing top entities related with Alzheimer’s disease. In addition, we conduct a series of analyses including gene, pathway, and gene-phenotype relationship analysis.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Stephen Clark,et al.  Specializing Word Embeddings for Similarity or Relatedness , 2015, EMNLP.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Cynthia Brandt,et al.  Semantic similarity in the biomedical domain: an evaluation across knowledge sources , 2012, BMC Bioinformatics.

[5]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[6]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[7]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[8]  Ted Pedersen,et al.  Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet , 2012, IHI '12.

[9]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[10]  Min Song,et al.  PKDE4J: Entity and relation extraction for public knowledge discovery , 2015, J. Biomed. Informatics.

[11]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[12]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.