Extracting Gene Regulation Networks Using Linear-Chain Conditional Random Fields and Rules

Published literature in molecular genetics may collectively provide much information on gene regulation networks. Dedicated computational approaches are required to sip through large volumes of text and infer gene interactions. We propose a novel sieve-based relation extraction system that uses linear-chain conditional random fields and rules. Also, we introduce a new skip-mention data representation to enable distant relation extraction using first-order models. To account for a variety of relation types, multiple models are inferred. The system was applied to the BioNLP 2013 Gene Regulation Network Shared Task. Our approach was ranked first of five, with a slot error rate of 0.73.

[1]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[2]  Robert Bossy,et al.  BioNLP Shared Task 2013 - An overview of the Genic Regulation Network Task , 2013, BioNLP@ACL.

[3]  Dan Klein,et al.  Coreference Semantics from Web Features , 2012, ACL.

[4]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[5]  Huang Xun,et al.  A Review of Relation Extraction , 2013 .

[6]  Jun'ichi Tsujii,et al.  Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries , 2012, J. Am. Medical Informatics Assoc..

[7]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[8]  Yaliang Li,et al.  Extracting Relation Descriptors with Conditional Random Fields , 2011, IJCNLP.

[9]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[10]  Oren Etzioni,et al.  Unsupervised Resolution of Objects and Relations on the Web , 2007, NAACL.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Vincent Claveau IRISA participation to BioNLP-ST13: lazy-learning and information retrieval for information extraction tasks , 2013, BioNLP@ACL.

[13]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[14]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[15]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[16]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Tapio Salakoski,et al.  EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction , 2013, BioNLP@ACL.

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Ting Wang,et al.  Automatic Extraction of Hierarchical Relations from Text , 2006, ESWC.

[21]  Jari Björne,et al.  Semantically linking molecular entities in literature through entity relationships , 2012, BMC Bioinformatics.

[22]  Nada Lavrac,et al.  LemmaGen: Multilingual Lemmatisation with Induced Ripple-Down Rules , 2010, J. Univers. Comput. Sci..

[23]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[24]  Marianne Bronner-Fraser,et al.  A gene regulatory network orchestrates neural crest formation , 2008, Nature Reviews Molecular Cell Biology.

[25]  Pablo Gamallo,et al.  Dependency-Based Text Compression for Semantic Relation Extraction , 2011 .

[26]  Simon Lin,et al.  GeneRIF is a more comprehensive, current and computationally tractable source of gene-disease relationships than OMIM , 2006 .

[27]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[28]  Carol A. Bocchini,et al.  A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) , 2011, Human mutation.

[29]  Marie-Francine Moens,et al.  Detecting Relations in the Gene Regulation Network , 2013, BioNLP@ACL.

[30]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[31]  Jari Björne,et al.  TEES 2.1: Automated Annotation Scheme Learning in the BioNLP 2013 Shared Task , 2013, BioNLP@ACL.

[32]  Kevin A Clauson,et al.  Ability of online drug databases to assist in clinical decision-making with infectious disease therapies , 2008, BMC infectious diseases.

[33]  Nguyen Bach,et al.  A Review of Relation Extraction , 2007 .

[34]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.