Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining

We present Dragon TF Association Miner (DTFAM), a system for text-mining of PubMed documents for potential functional association of transcription factors (TFs) with terms from Gene Ontology (GO) and with diseases. DTFAM has been trained and tested in the selection of relevant documents on a manually curated dataset containing >3000 PubMed abstracts relevant to transcription control. On our test data the system achieves sensitivity of 80% with specificity of 82%. DTFAM provides comprehensive tabular and graphical reports linking terms to relevant sets of documents. These documents are color-coded for easier inspection. DTFAM complements the existing biological resources by collecting, assessing, extracting and presenting associations that can reveal some of the not so easily observable connections among the entities found which could explain the functions of TFs and help decipher parts of gene transcriptional regulatory networks. DTFAM summarizes information from a large volume of documents saving time and making analysis simpler for individual users. DTFAM is freely available for academic and non-profit users at http://research.i2r.a-star.edu.sg/DRAGON/TFAM/.

[1]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[2]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[3]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[4]  B. Asher Decision analytics software solutions for proteomics analysis. , 2000, Journal of molecular graphics & modelling.

[5]  P Bork,et al.  Automated extraction of information in molecular biology , 2000, FEBS letters.

[6]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[7]  C. Blaschke,et al.  The potential use of SUISEKI as a protein interaction discovery tool. , 2001, Genome informatics. International Conference on Genome Informatics.

[8]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[9]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[10]  Padmini Srinivasan,et al.  MeSHmap: a text mining tool for MEDLINE , 2001, AMIA.

[11]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[12]  Joel D. Martin,et al.  Getting to the (c)ore of knowledge: mining biomedical literature , 2002, Int. J. Medical Informatics.

[13]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[14]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[15]  Steffen Schulze-Kremer,et al.  Ontologies for molecular biology and bioinformatics , 2002, Silico Biol..

[16]  L. Grivell Mining the bibliome: searching for a needle in a haystack? , 2002, EMBO reports.

[17]  Seng Hong Seah,et al.  Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. , 2003, Genome research.

[18]  Tiffani J. Bright,et al.  PubMatrix: a tool for multiplex literature mining , 2003, BMC Bioinformatics.

[19]  Vladimir B. Bajic,et al.  Dragon Gene Start Finder identifies approximate locations of the 5' ends of genes , 2003, Nucleic Acids Res..

[20]  Jung-Hsien Chiang,et al.  MeKE: Discovering the Functions of Gene Products from Biomedical Literature Via Sentence Alignment , 2003, Bioinform..

[21]  M. Telepnev,et al.  Francisella tularensis inhibits Toll‐like receptor‐mediated activation of intracellular signalling and secretion of TNF‐α and IL‐1 from murine macrophages , 2003, Cellular microbiology.

[22]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[23]  Miguel A. Andrade-Navarro,et al.  Update on XplorMed: a web server for exploring scientific literature , 2003, Nucleic Acids Res..

[24]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[25]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[26]  Jung-Hsien Chiang,et al.  GIS: a biomedical text-mining system for gene information discovery , 2004, Bioinform..