Passage retrieval based hidden knowledge discovery from biomedical literature

Biomedical literature is growing at a double-exponential pace and automatic extraction of the implicit biological relationship from biomedical literature contributes to building the biomedical hypothesis that can be explored further experimentally. This paper presents a passage retrieval based method which can explore the hidden connection from MEDLINE records. In this method, the MeSH concepts are retrieved from the sentence-level windows and are therefore more relevant with the starting term. This method is tested on three classical implicit connections: Alzheimer's disease and indomethacin, Migraine and Magnesium, Schizophrenia and Calcium-independent phospholipase A2 in the open discovery. In our experiments, three computational methods for scoring and ranking the MeSH terms are explored: z-score, TFIDF (Term Frequency Inverse Document Frequency) and PMI (pointwise mutual information). Experimental results show this method can significantly improve the hidden knowledge discovery performance.

[1]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[2]  Olivier Bodenreider,et al.  Exploring semantic groups through visual approaches , 2003, J. Biomed. Informatics.

[3]  Alan R. Aronson,et al.  Automatic Indexing of Specialized Documents: Using Generic vs. Domain-Specific Document Representations , 2007, BioNLP@ACL.

[4]  Wanda Pratt,et al.  Using statistical and knowledge-based approaches for literature-based discovery , 2006, J. Biomed. Informatics.

[5]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[6]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[7]  Don R. Swanson,et al.  Two medical literatures that are logically but not bibliographically connected , 1987, J. Am. Soc. Inf. Sci..

[8]  Michael D. Gordon,et al.  Literature-based discovery by lexical statistics , 1999 .

[9]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[10]  Ying Liu,et al.  Text Mining Biomedical Literature for Genomic Knowledge Discovery , 2005 .

[11]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001 .

[12]  Neil R. Smalheiser,et al.  Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease , 1994 .

[13]  Don R. Swanson,et al.  Online search for logically-related noninteractive medical literatures: A systematic trial-and-error strategy , 1989, JASIS.

[14]  Jonathon Read,et al.  Recognising Affect in Text using Pointwise-Mutual Information , 2004 .

[15]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .