@MInter: automated text-mining of microbial interactions

MOTIVATION Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible. RESULTS We present @MInter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @MInter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @MInter was able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity = 95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @MInter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @MInter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities. AVAILABILITY AND IMPLEMENTATION @MInter is freely available at https://github.com/CSB5/atminter CONTACT nagarajann@gis.a-star.edu.sg SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  Gunnar Rätsch,et al.  Ecological Modeling from Time-Series Inference: Insight into Dynamics and Stability of Intestinal Microbiota , 2013, PLoS Comput. Biol..

[3]  Weisong Liu,et al.  OntoMate: a text-mining tool aiding curation at the Rat Genome Database , 2015, Database J. Biol. Databases Curation.

[4]  G. Bergonzelli,et al.  GroEL of Lactobacillus johnsonii La1 (NCC 533) Is Cell Surface Associated: Potential Role in Interactions with the Host and the Gastric Pathogen Helicobacter pylori , 2006, Infection and Immunity.

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[7]  Lingling An,et al.  Investigating microbial co-occurrence patterns based on metagenomic compositional data , 2015, Bioinform..

[8]  S. Mazmanian,et al.  Gut biogeography of the bacterial microbiota , 2015, Nature Reviews Microbiology.

[9]  Shreya Dixit,et al.  The skin microbiome: Associations between altered microbial communities and disease , 2015, The Australasian journal of dermatology.

[10]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[11]  P. Cotter,et al.  Role of the gut microbiota in health and chronic gastrointestinal disease: understanding a hidden metabolic organ , 2013, Therapeutic advances in gastroenterology.

[12]  V. Scaria,et al.  Screening Currency Notes for Microbial Pathogens and Antibiotic Resistance Genes Using a Shotgun Metagenomic Approach , 2015, PloS one.

[13]  S. Talbot,et al.  Organic matter quantity and source affects microbial community structure and function following volcanic eruption on Kasatochi Island, Alaska. , 2016, Environmental microbiology.

[14]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[15]  Johan Wagemans,et al.  A New Perceptual Bias Reveals Suboptimal Population Decoding of Sensory Responses , 2012, PLoS Comput. Biol..

[16]  Functional gene arrays-based analysis of fecal microbiomes in patients with liver cirrhosis , 2014, BMC Genomics.

[17]  Intawat Nookaew,et al.  Metagenomic Data Utilization and Analysis (MEDUSA) and Construction of a Global Gut Microbial Gene Catalogue , 2014, PLoS Comput. Biol..

[18]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[19]  Curtis Huttenhower,et al.  Microbial Co-occurrence Relationships in the Human Microbiome , 2012, PLoS Comput. Biol..

[20]  Knut Rudi,et al.  Web of ecological interactions in an experimental gut microbiota. , 2010, Environmental microbiology.

[21]  J. Pepper,et al.  The emerging medical ecology of the human gut microbiome. , 2012, Trends in ecology & evolution.

[22]  Roded Sharan,et al.  The large-scale organization of the bacterial network of ecological co-occurrence interactions , 2010, Nucleic acids research.

[23]  Allyson L. Byrd,et al.  Biogeography and individuality shape function in the human skin metagenome , 2014, Nature.

[24]  Chris Sander,et al.  Precision microbiome restoration of bile acid-mediated resistance to Clostridium difficile , 2014, Nature.

[25]  M. Silverberg,et al.  Analyzing the Human Microbiome: A “How To” guide for Physicians , 2014, The American Journal of Gastroenterology.

[26]  Chitta Baral,et al.  Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism , 2010, Bioinform..

[27]  S. Raimondi,et al.  Folate Production by Probiotic Bacteria , 2011, Nutrients.

[28]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[29]  Timothy L. Tickle,et al.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment , 2012, Genome Biology.

[30]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[31]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.