BactInt: A domain driven transfer learning approach and a corpus for extracting inter-bacterial interactions from biomedical text

The community of different types of microbes present in a biological niche plays a very important role in functioning of the system. The crosstalk or interactions among the different microbes contributes to the building blocks of such microbial community structures. Evidence reported in biomedical text serves as a reliable source for predicting such interactions. However, going through the vast and ever-increasing volume of biomedical literature is an intimidating and time consuming process. This necessitates development of automated methods capable of accurately extracting bacterial relations reported in biomedical literature. In this paper, we introduce a method for automated extraction of microbial interactions (specifically between bacteria) from biomedical literature along with ways of using transfer learning to improve its accuracy. We also describe a pipeline using which relations among specific bacteria groups can be mined. Additionally, we introduce the first publicly available dataset which can be used to develop bacterial interaction extraction methods.

[1]  Shenmin Zhang,et al.  BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining , 2022, Briefings Bioinform..

[2]  Tingting He,et al.  Multi-type Microbial Relation Extraction by Transfer Learning , 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Sharmila S. Mande,et al.  Utilizing domain-based features to improve classification accuracy of biomedical text having bacterial associations , 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Ruizhang Huang,et al.  Protein-protein interaction relation extraction based on multigranularity semantic fusion , 2021, J. Biomed. Informatics.

[5]  Canqun Yang,et al.  Mining microbe–disease interactions from literature via a transfer learning model , 2021, BMC Bioinformatics.

[6]  S. Yooseph,et al.  Bacterial associations in the healthy human gut microbiome across populations , 2021, Scientific Reports.

[7]  A. Valencia,et al.  Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations , 2021 .

[8]  Jie Luo,et al.  BioRel: towards large-scale biomedical relation extraction , 2020, BMC Bioinformatics.

[9]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[10]  Lu Sun,et al.  NCBI Taxonomy: a comprehensive update on curation, resources and tools , 2020, Database J. Biol. Databases Curation.

[11]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[12]  Louise Deléger,et al.  Bacteria Biotope at BioNLP Open Shared Tasks 2019 , 2019, EMNLP.

[13]  Xusheng Li,et al.  Bacterial Named Entity Recognition Based on Language Model , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[14]  Xusheng Li,et al.  Microbial Interaction Extraction from Biomedical Literature using Max-Bi-LSTM , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[15]  Sharmila S. Mande,et al.  “EviMass”: A Literature Evidence-Based Miner for Human Microbial Associations , 2019, Front. Genet..

[16]  Donald C. Comeau,et al.  LitSense: making sense of biomedical literature at sentence level , 2019, Nucleic Acids Res..

[17]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[18]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[19]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Tingting Zhao,et al.  Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering , 2019, Database.

[22]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[23]  Hong Liang,et al.  Text feature extraction based on deep learning: a review , 2017, EURASIP J. Wirel. Commun. Netw..

[24]  Radu Marculescu,et al.  MPLasso: Inferring microbial association networks using prior microbial knowledge , 2017, PLoS Comput. Biol..

[25]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[26]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Z. Xiu,et al.  Advances in industrial microbiome based on microbial consortium for biorefinery , 2017, Bioresources and Bioprocessing.

[29]  Niranjan Nagarajan,et al.  @MInter: automated text-mining of microbial interactions , 2016, Bioinform..

[30]  Matthew E. Falagas,et al.  An analysis of factors contributing to PubMed's growth , 2015, J. Informetrics.

[31]  R. Dietert,et al.  The Microbiome and Sustainable Healthcare , 2015, Healthcare.

[32]  M. Schloter,et al.  The plant microbiome and its importance for plant and human health , 2014, Frontiers in Microbiology.

[33]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[34]  Frederick Reiss,et al.  Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! , 2013, EMNLP.

[35]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[36]  James Versalovic,et al.  The Human Microbiome and Its Potential Importance to Pediatrics , 2012, Pediatrics.

[37]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[38]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Elena Beisswanger,et al.  The GeneReg Corpus for Gene Expression Regulation Events — An Overview of the Corpus and its In-Domain and Out-of-Domain Interoperability , 2010, LREC.

[41]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[42]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[43]  Y.M. Kadah,et al.  Extraction of protein interaction information from unstructured text using a link grammar parser , 2007, 2007 International Conference on Computer Engineering & Systems.

[44]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[45]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[46]  Tapio Salakoski,et al.  Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions , 2006, Int. J. Medical Informatics.

[47]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[48]  Hasan Davulcu,et al.  IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text , 2005, LBLODMBS@IDMB.

[49]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[50]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[51]  Sougata Mukherjea,et al.  Information extraction from biomedical literature: methodology, evaluation and an application , 2003, CIKM '03.

[52]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.