Extracting rate changes in transcriptional regulation from MEDLINE abstracts

BackgroundTime delays are important factors that are often neglected in gene regulatory network (GRN) inference models. Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. In this paper, we mine evidences for time delays related to the transcriptional regulation of yeast from the PubMed abstracts.ResultsSince the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidences of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidences for time delays (shorter or longer) in GRN. Thus, we focus on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labeled with such events. In order to capture these events automatically, we create an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also propose effective feature extraction methods based on the created ontology to identify the direct evidences with specific details of these events. Our ontologies outperform existing state-of-the-art gene regulation ontologies in the automatic rule learning method applied to our corpus. The proposed deterministic ontology rule-based method can achieve comparable performance to the automatic rule learning method based on decision trees. This demonstrates the effectiveness of our ontology in identifying rate-changing events. We also tested the effectiveness of the proposed feature mining methods on detecting direct evidence of events. Experimental results show that the machine learning method on these features achieves an F1-score of 71.43%.ConclusionsThe manually labeled corpus of events relating to rate changes in transcriptional regulation for yeast is available in https://sites.google.com/site/wentingntu/data. The created ontologies summarized both biological causes of rate changes in transcriptional regulation and corresponding positive and negative textual patterns from the corpus. They are demonstrated to be effective in identifying rate-changing events, which shows the benefits of combining textual patterns and biological knowledge on extracting complex biological events.

[1]  Patrick Ruch,et al.  Application of text-mining for updating protein post-translational modification annotation in UniProtKB , 2012, BMC Bioinformatics.

[2]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[3]  Sine Zambach,et al.  A lexical framework for semantic annotation of positive and negative regulation relations in biomedical pathways , 2010, Semantic Mining in Biomedicine.

[4]  Dietrich Rebholz-Schuhmann,et al.  Gene Regulation Ontology (GRO): Design Principles and Use Cases , 2008, MIE.

[5]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[6]  Kuiyu Chang,et al.  Gene Regulatory Networks from Gene Ontology , 2013, ISBRA.

[7]  Dietrich Rebholz-Schuhmann,et al.  How Feasible and Robust is the Automatic Extraction of Gene Regulation Events? A Cross-Method Evaluation under Lab and Real-Life Conditions , 2009, BioNLP@HLT-NAACL.

[8]  K. Dolinski,et al.  Systematic curation of protein and genetic interaction data for computable biology , 2013, BMC Biology.

[9]  Alfonso Valencia,et al.  The Functional Genomics Network in the evolution of biological text mining over the past decade. , 2013, New biotechnology.

[10]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[11]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[12]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[13]  Aldert H Piersma,et al.  Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data , 2013, BMC Medical Genomics.

[14]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[15]  Yiwen Wang,et al.  Improving Feature-Based Biomedical Event Extraction System by Integrating Argument Information , 2013, BioNLP@ACL.

[16]  Jari Björne,et al.  EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH‐BASED FEATURE SETS , 2011, Comput. Intell..

[17]  S. Baylin,et al.  Altered chromosomal methylation patterns accompany oncogene-induced transformation of human bronchial epithelial cells. , 1993, Cancer research.

[18]  Uri Alon,et al.  Response delays and the structure of transcription networks. , 2003, Journal of molecular biology.

[19]  Miguel A. Andrade-Navarro,et al.  LAITOR - Literature Assistant for Identification of Terms co-Occurrences and Relationships , 2010, BMC Bioinformatics.

[20]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[21]  Antonio Moreno,et al.  Ontology-based information extraction of regulatory networks from scientific articles with case studies for Escherichia coli , 2013, Expert Syst. Appl..

[22]  Dietrich Rebholz-Schuhmann,et al.  Biological network extraction from scientific literature: state of the art and challenges , 2014, Briefings Bioinform..

[23]  Peer Bork,et al.  Extraction of regulatory gene/protein networks from Medline , 2006, Bioinform..

[24]  Jun'ichi Tsujii,et al.  Entity-Focused Sentence Simplification for Relation Extraction , 2010, COLING.