Integrating Large-Scale Text Mining and Co-Expression Networks : Targeting NADP ( H ) Metabolism in E . coli with Event Extraction

We present an application of EVEX, a literature-scale event extraction resource, in the concrete biological use case of NADP(H) metabolism regulation in Escherichia coli. We make extensive use of the EVEX event generalization based on gene family definitions in Ensembl Genomes, to extract cross-species candidate regulators. We manually evaluate the resulting network so as to only preserve correct events and facilitate its integration with microarray-based co-expression data. When analysing the combined network obtained from text mining and co-expression, we identify 41 candidate genes involved in triangular patterns involving both subnetworks. Several of these candidates are of particular interest, and we discuss their biological relevance further. This study is the first to present a real-world evaluation of the EVEX resource in particular and literature-scale application of the systems emerging from the BioNLP Shared Task series in general. We summarize the lessons learned from this use case in order to focus future development of EVEX and similar literature-scale resources.

[1]  Axel Kowald,et al.  Dynamic rerouting of the carbohydrate flux is key to counteracting oxidative stress , 2007, Journal of biology.

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Tapio Salakoski,et al.  Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations , 2012, Adv. Bioinformatics.

[4]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[5]  Tapio Salakoski,et al.  EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions , 2011, BioNLP@ACL.

[6]  A. Krapp,et al.  The soxRS response of Escherichia coli can be induced in the absence of oxidative stress and oxygen by modulation of NADPH content. , 2011, Microbiology.

[7]  Markus J. Herrgård,et al.  Integrating high-throughput and computational data elucidates bacterial networks , 2004, Nature.

[8]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[9]  Gautier Koscielny,et al.  Ensembl Genomes: Extending Ensembl across the taxonomic space , 2009, Nucleic Acids Res..

[10]  W. John Wilbur,et al.  PIE the search: searching PubMed literature for protein interaction information , 2012, Bioinform..

[11]  Peter D. Karp,et al.  EcoCyc: a comprehensive database of Escherichia coli biology , 2010, Nucleic Acids Res..

[12]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[13]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[14]  Kenneth E. Rudd,et al.  EcoGene: a genome sequence database for Escherichia coli K-12 , 2000, Nucleic Acids Res..

[15]  V. Hatzimanikatis,et al.  Thermodynamics-based metabolic flux analysis. , 2007, Biophysical journal.

[16]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[17]  Hung-Yu Kao,et al.  Cross-species gene normalization by species inference , 2011, BMC Bioinformatics.

[18]  J. Stewart,et al.  Understanding and Improving NADPH‐Dependent Reactions by Nongrowing Escherichia coli Cells , 2008, Biotechnology progress.

[19]  Jari Björne,et al.  Scaling up Biomedical Event Extraction to the Entire PubMed , 2010, BioNLP@ACL.

[20]  Hongfang Liu,et al.  Gene name ambiguity of eukaryotic nomenclatures , 2005, Bioinform..

[21]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[22]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[23]  Zhiyong Lu,et al.  The gene normalization task in BioCreative III , 2011, BMC Bioinformatics.

[24]  Beatriz García Jiménez,et al.  EcID. A database for the inference of functional interactions in E. coli , 2008, Nucleic Acids Res..

[25]  Adrienne E. Zweifel,et al.  EcoliWiki: a wiki-based community resource for Escherichia coli , 2011, Nucleic Acids Res..

[26]  J. Ramos,et al.  Regulation of Glucose Metabolism in Pseudomonas , 2009, The Journal of Biological Chemistry.

[27]  H. Schellhorn,et al.  RpoS regulation of gene expression during exponential growth of Escherichia coli K12 , 2008, Molecular Genetics and Genomics.

[28]  Jun'ichi Tsujii,et al.  An Intelligent Search Engine and GUI-based Efficient MEDLINE Search Tool Based on Deep Syntactic Parsing , 2006, ACL.

[29]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[30]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[31]  Erin M. Conlon,et al.  Rapid Changes in Gene Expression Dynamics in Response to Superoxide Reveal SoxRS-Dependent and Independent Transcriptional Networks , 2007, PloS one.

[32]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[33]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.