Using Natural Language Processing, LocusLink and the Gene Ontology to Compare OMIM to MEDLINE

Researchers in the biomedical and molecular biology fields are faced with a wide variety of information sources. These are presented in the form of images, free text, and structured data files that include medical records, gene and protein sequence data, and whole genome microarray data, all gathered from a variety of experimental organisms and clinical subjects. The need to organize and relate this information, particularly concerning genes, has motivated the development of resources, such as the Unified Medical Language System, Gene Ontology, LocusLink, and the Online Inheritance In Man (OMIM) database. We describe a natural language processing application to extract information on genes from unstructured text and discuss ways to integrate this information with some of the available online resources.

[1]  Halil Kilicoglu,et al.  Semantic Relations Asserting the Etiology of Genetic Diseases , 2003, AMIA.

[2]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[3]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[4]  Padmini Srinivasan,et al.  Exploring text mining from MEDLINE , 2002, AMIA.

[5]  Mark R. Gilder,et al.  Extraction of protein interaction information from unstructured text using a context-free grammar , 2003, Bioinform..

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[8]  William H. Majoros,et al.  Genomics and natural language processing , 2002, Nature Reviews Genetics.

[9]  Joel D. Martin,et al.  Getting to the (c)ore of knowledge: mining biomedical literature , 2002, Int. J. Medical Informatics.

[10]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[11]  Thomas Ludwig,et al.  The BRCA1/BARD1 heterodimer, a tumor suppressor complex with ubiquitin E3 ligase activity. , 2002, Current opinion in genetics & development.

[12]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[13]  Rudolph L. Leibel,et al.  Identifying functional relationships among human genes by systematic analysis of biological literature , 2002, BMC Bioinformatics.

[14]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[15]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[16]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[17]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in biomedical text , 2002, Bioinform..

[18]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[19]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[20]  G Hripcsak,et al.  Natural language processing and its future in medicine. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[21]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[22]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[23]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[24]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[25]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[26]  Susanne M. Humphrey,et al.  Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation , 1999, J. Am. Soc. Inf. Sci..

[27]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[28]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[29]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[30]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[31]  Ng,et al.  Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. , 1999, Genome informatics. Workshop on Genome Informatics.

[32]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[33]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.