Extracting Information for Meaningful Function Inference through Text-Mining

One of the emerging technologies in computational biology is text-mining which includes natural language processing. This technology enables extraction of parts of relevant biological knowledge from a large volume of scientific documents in an automated fashion. We present several systems which cover different facets of text-mining biological information with applications in transcription control, metabolic pathways, and bacterial cross-species comparison. We demonstrate how this technology can efficiently support biologists and medical scientists to infer function of biological entities and save them a lot of time, paving way for more focused and detailed follow-up research.

[1]  Steffen Schulze-Kremer,et al.  Ontologies for molecular biology and bioinformatics , 2002, Silico Biol..

[2]  N. Stamford,et al.  In vitro properties of a recombinant flavonol synthase from Arabidopsis thaliana. , 2002, Phytochemistry.

[3]  S. Rhee,et al.  AraCyc: A Biochemical Pathway Database for Arabidopsis1 , 2003, Plant Physiology.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[6]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[7]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[8]  H. Cheong,et al.  Cymbidium hybrida dihydroflavonol 4-reductase does not efficiently reduce dihydrokaempferol to produce orange pelargonidin-type anthocyanins. , 1999, The Plant journal : for cell and molecular biology.

[9]  C. Blaschke,et al.  The frame-based module of the SUISEKI information extraction system , 2002 .

[10]  Rolf Apweiler,et al.  IntEnz, the integrated relational enzyme database , 2004, Nucleic Acids Res..

[11]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[12]  Costas D Maranas,et al.  Review of the BRENDA Database. , 2003, Metabolic engineering.

[13]  Lynda B. M. Ellis,et al.  The University of Minnesota Biocatalysis/Biodegradation Database: emphasizing enzymes , 2001, Nucleic Acids Res..

[14]  Alexander E. Kel,et al.  TRANSCompel®: a database on composite regulatory elements in eukaryotic genes , 2002, Nucleic Acids Res..

[15]  Mark D'Souza,et al.  SENTRA, a database of signal transduction proteins , 2000, Nucleic Acids Res..

[16]  S. Akira,et al.  Toll-like receptors; their physiological role and signal transduction system. , 2001, International immunopharmacology.

[17]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[18]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[19]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[20]  R. Macnab,et al.  How bacteria assemble flagella. , 2003, Annual review of microbiology.

[21]  Joel D. Martin,et al.  Getting to the (c)ore of knowledge: mining biomedical literature , 2002, Int. J. Medical Informatics.

[22]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001 .

[23]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[24]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[25]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[26]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes. , 2004, Nucleic acids research.

[27]  Daniel K Owens,et al.  Quantification of the production of dihydrokaempferol by flavanone 3-hydroxytransferase using capillary electrophoresis. , 2002, Phytochemical analysis : PCA.

[28]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[29]  K. Jarrell,et al.  Prokaryotic motility structures. , 2003, Microbiology.

[30]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[31]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[32]  Rohit J. Kate,et al.  Learning to Extract Proteins and their Interactions from Medline Abstracts , 2003 .

[33]  Soojin Lee,et al.  Toll-like receptors and inflammation in the CNS. , 2002, Current drug targets. Inflammation and allergy.

[34]  P Bork,et al.  Automated extraction of information in molecular biology , 2000, FEBS letters.

[35]  Jung-Hsien Chiang,et al.  MeKE: Discovering the Functions of Gene Products from Biomedical Literature Via Sentence Alignment , 2003, Bioinform..

[36]  Miguel A. Andrade-Navarro,et al.  Update on XplorMed: a web server for exploring scientific literature , 2003, Nucleic Acids Res..

[37]  Padmini Srinivasan,et al.  MeSHmap: a text mining tool for MEDLINE , 2001, AMIA.

[38]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs , 2000, Nucleic Acids Res..

[39]  Jung-Hsien Chiang,et al.  GIS: a biomedical text-mining system for gene information discovery , 2004, Bioinform..

[40]  B. Asher Decision analytics software solutions for proteomics analysis. , 2000, Journal of molecular graphics & modelling.

[41]  D. Kaiser,et al.  Type IV pili and cell motility , 1999, Molecular microbiology.

[42]  M. Telepnev,et al.  Francisella tularensis inhibits Toll‐like receptor‐mediated activation of intracellular signalling and secretion of TNF‐α and IL‐1 from murine macrophages , 2003, Cellular microbiology.

[43]  T. Werner,et al.  Computer modeling of promoter organization as a tool to study transcriptional coregulation , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[44]  Steven Dickman,et al.  Tough Mining , 2003, PLoS biology.

[45]  C. Blaschke,et al.  The potential use of SUISEKI as a protein interaction discovery tool. , 2001, Genome informatics. International Conference on Genome Informatics.

[46]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[47]  L. Grivell Mining the bibliome: searching for a needle in a haystack? , 2002, EMBO reports.

[48]  Tiffani J. Bright,et al.  PubMatrix: a tool for multiplex literature mining , 2003, BMC Bioinformatics.

[49]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.