Literature Mining and Ontology based Analysis of Host-Brucella Gene–Gene Interaction Network

Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host–pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene–gene interactions from the abstracts of articles in PubMed. The gene–gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene–gene interactions demonstrates that host–pathogen gene–gene interactions occur at experimental conditions which can be ontologically represented. Our results show that the introduced literature mining and ontology-based modeling approach are effective in retrieving and analyzing host–pathogen gene–gene interaction networks.

[1]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[2]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[3]  Y. He,et al.  Caspase-2 Mediated Apoptotic and Necrotic Murine Macrophage Cell Death Induced by Rough Brucella abortus , 2009, PloS one.

[4]  Reinhard Guthke,et al.  A review on computational systems biology of pathogen–host interactions , 2015, Front. Microbiol..

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  T. Ficht,et al.  Immunization with a Single Dose of a Microencapsulated Brucella melitensis Mutant Enhances Protection against Wild-Type Challenge , 2008, Infection and Immunity.

[7]  Yongqun He,et al.  Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining , 2012, Journal of Biomedical Semantics.

[8]  Zhang Qing,et al.  Completion of the Genome Sequence of Brucella abortus and Comparison to the Highly Similar Genomes of Brucella melitensis and Brucella suis , 2005, Journal of bacteriology.

[9]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[10]  Dmitry Korkin,et al.  Literature mining of host-pathogen interactions: comparing feature-based supervised learning and language-based approaches , 2012, Bioinform..

[11]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[12]  Yongqun He,et al.  A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks , 2013, BMC Systems Biology.

[13]  Yu Lin,et al.  Brucellosis Ontology (IDOBRU) as an extension of the Infectious Disease Ontology , 2011, J. Biomed. Semant..

[14]  K. Ikemura Development and application , 1971 .

[15]  Dragomir R. Radev,et al.  Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing , 2007, EMNLP.

[16]  D. O’Callaghan,et al.  Brucella genomics as we enter the multi-genome era. , 2011, Briefings in functional genomics.

[17]  Yongqun He,et al.  Extension of the Interaction Network Ontology for Literature Mining of Gene-gene Interaction Networks from Sentences with Multiple Interaction Keywords , 2015, BDM2I@ISWC.

[18]  Yu Lin,et al.  Ontology-based representation and analysis of host-Brucella interactions , 2015, Journal of Biomedical Semantics.

[19]  Adam D. Schuyler,et al.  SciMiner: web-based literature mining tool for target identification and functional enrichment analysis , 2009, Bioinform..

[20]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[21]  Fatih Erdogan Sevilgen,et al.  PHISTO: pathogen-host interaction search tool , 2013, Bioinform..

[22]  Hongfang Liu,et al.  Document Classification for Mining Host Pathogen Protein-Protein Interactions , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[23]  Chun-Nan Hsu,et al.  Integrating high dimensional bi-directional parsing models for gene mention tagging , 2008, ISMB.

[24]  F. Goldbaum,et al.  Brucella Lumazine Synthase Elicits a Mixed Th1-Th2 Immune Response and Reduces Infection in Mice Challenged with Brucella abortus 544 Independently of the Adjuvant Formulation Used , 2003, Infection and Immunity.

[25]  Dragomir R. Radev,et al.  Mining of vaccine-associated IFN-γ gene interaction networks using the Vaccine Ontology , 2011, J. Biomed. Semant..

[26]  Jari Björne,et al.  All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning , 2008, BMC Bioinformatics.

[27]  M. J. Corbel,et al.  Brucellosis: an overview. , 1997, Emerging infectious diseases.

[28]  Y. He,et al.  PHIDIAS: a pathogen-host interaction data integration and analysis system , 2007, Genome Biology.

[29]  Bindu Nanduri,et al.  HPIDB - a unified resource for host-pathogen interactions , 2010, BMC Bioinformatics.

[30]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[31]  Yongqun He,et al.  Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network , 2011, BMC Immunology.

[32]  V. Azevedo,et al.  Molecular and immunological characterisation of recombinant Brucella abortus glyceraldehyde-3-phosphate-dehydrogenase, a T- and B-cell reactive protein that induces partial protection when co-administered with an interleukin-12-expressing plasmid in a DNA vaccine formulation. , 2002, Journal of medical microbiology.

[33]  Yongqun He,et al.  Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions , 2015, Journal of Biomedical Semantics.

[34]  Fernando Pereira,et al.  Identifying gene and protein mentions in text using conditional random fields , 2005, BMC Bioinformatics.

[35]  Wen-Lian Hsu,et al.  NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition , 2006, BMC Bioinformatics.

[36]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[37]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[38]  J. Letesson,et al.  Protection of BALB/c Mice against Brucella abortus 544 Challenge by Vaccination with Bacterioferritin or P39 Recombinant Proteins with CpG Oligodeoxynucleotides as Adjuvant , 2001, Infection and Immunity.

[39]  C. Blaschke,et al.  The frame-based module of the SUISEKI information extraction system , 2002 .

[40]  Erik M. van Mulligen,et al.  Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes , 2005, Bioinform..