Using Dependency Parsing and Probabilistic Inference to Extract Relationships between Genes, Proteins and Malignancies Implicit Among Multiple Biomedical Research Abstracts

We describe BioLiterate, a prototype software system which infers relationships involving relationships between genes, proteins and malignancies from research abstracts, and has initially been tested in the domain of the molecular genetics of oncology. The architecture uses a natural language processing module to extract entities, dependencies and simple semantic relationships from texts, and then feeds these features into a probabilistic reasoning module which combines the semantic relationships extracted by the NLP module to form new semantic relationships. One application of this system is the discovery of relationships that are not contained in any individual abstract but are implicit in the combined knowledge contained in two or more abstracts.

[1]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[2]  Anna Wierzbicka,et al.  Meaning and Universal Grammar: Theory and empirical findings , 2002 .

[3]  Ramanathan V. Guha,et al.  Enabling agents to work together , 1994, CACM.

[4]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[5]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[6]  Thilo Götz,et al.  Design and implementation of the UIMA Common Analysis System , 2004, IBM Syst. J..

[7]  Dietrich Rebholz-Schuhmann,et al.  LLL'05 Challenge: Genic Interaction Extraction - Identication of Language Patterns Based on Alignment and Finite State Automata , 2005 .

[8]  Yang Jin,et al.  Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE , 2005, ACL.

[9]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[10]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[11]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[12]  D. Swanson,et al.  Linking estrogen to Alzheimer's disease , 1996, Neurology.

[13]  D B Searls,et al.  Mining the bibliome , 2001, The Pharmacogenomics Journal.

[14]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[15]  Jong Cheol Park,et al.  Generation of Coherent Gene Summary with Concept-Linking Sentences , 2005 .

[16]  Jun Xu,et al.  Extracting biochemical interactions from MEDLINE using a link grammar parser , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[17]  Hasan Davulcu,et al.  IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text , 2005, LBLODMBS@IDMB.

[18]  C. Goddard 5. The On-going Development of the NSM Research Program , 2002 .

[19]  Peter Szolovits,et al.  Adding a Medical Lexicon to an English Parser , 2003, AMIA.

[20]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[21]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[22]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[23]  Mark Stevenson,et al.  Automatically acquiring a linguistically motivated genic interaction extraction system , 2005, ICML 2005.

[24]  Ewan Klein,et al.  Genic interaction extraction with semantic and syntactic chains , 2005 .

[25]  Fernando Pereira,et al.  Identifying gene and protein mentions in text using conditional random fields , 2005, BMC Bioinformatics.

[26]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[27]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[28]  Ramanathan V. Guha,et al.  CYC: A Midterm Report , 1990, AI Mag..

[29]  A. Wierzbicka Semantics: Primes and Universals , 1996 .

[30]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.