Extracting Molecular Binding Relationships from Biomedical Text

ARBITER is a Prolog program that extracts assertions about macromolecular binding relationships from biomedical text. We describe the domain knowledge and the underspecified linguistic analyses that support the identification of these predications. After discussing a formal evaluation of ARBITER, we report on its application to 491, 000 MEDLINE® abstracts, during which almost 25, 000 binding relationships suitable for entry into a database of macromolecular function were extracted.

[1]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[2]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[3]  Emmanuel Morin Projecting Corpus-Based Semantic Links on a Thesaurus , 1999, ACL.

[4]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[5]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[6]  Peter Norvig,et al.  Text-Based Intelligent Systems , 1994, Artif. Intell..

[7]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[8]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[9]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[10]  G. William Moore,et al.  Barrier word method for detecting molecular biology multiple word terms , 1988 .

[11]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[12]  Alan R. Aronson,et al.  Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[13]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[14]  Donald Hindle,et al.  Deterministic Parsing of Syntactic Non-fluencies , 1983, ACL.

[15]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Thomas C. Rindflesch,et al.  Integrating Natural Language Processing and Biomedical Domain Knowledge Increased Information Retrieval Effectiveness , 1995 .

[17]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[18]  Lluís Padró,et al.  Developing a hybrid NP parser , 1997, ANLP.

[19]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[20]  Rajeev Agarwal,et al.  A Simple but Useful Approach to Conjunct Identification , 1992, ACL.

[21]  ChengXiang Zhai,et al.  Fast Statistical Parsing of Noun Phrases for Document Indexing , 1997, ANLP.

[22]  David D. McDonald,et al.  Robust partial-parsing through incremental, multi-algorithm processing , 1992 .

[23]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[24]  Lawrence Hunter,et al.  Mining molecular binding terminology from biomedical text , 1999, AMIA.