Multiple instance learning for predicting necrotizing enterocolitis in premature infants using microbiome data

Necrotizing enterocolitis (NEC) is a life-threatening intestinal disease that primarily affects preterm infants during their first weeks after birth. Mortality rates associated with NEC are 15-30%, and surviving infants are susceptible to multiple serious, long-term complications. The disease is sporadic and, with currently available tools, unpredictable. We are creating an early warning system that uses stool microbiome features, combined with clinical and demographic information, to identify infants at high risk of developing NEC. Our approach uses a multiple instance learning, neural network-based system that could be used to generate daily or weekly NEC predictions for premature infants. The approach was selected to effectively utilize sparse and weakly annotated datasets characteristic of stool microbiome analysis. Here we describe initial validation of our system, using clinical and microbiome data from a nested case-control study of 161 preterm infants. We show receiver-operator curve areas above 0.9, with 75% of dominant predictive samples for NEC-affected infants identified at least 24 hours prior to disease onset. Our results pave the way for development of a real-time early warning system for NEC using a limited set of basic clinical and demographic details combined with stool microbiome data.

[1]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[2]  V. Pawlowsky-Glahn,et al.  Compositional data: the sample space and its structure , 2019, TEST.

[3]  Fei Guo,et al.  Taxonomy dimension reduction for colorectal cancer prediction , 2019, Comput. Biol. Chem..

[4]  Andreas Henschel,et al.  Taxonomy-aware feature engineering for microbiome classification , 2018, BMC Bioinformatics.

[5]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[6]  Yun S. Song,et al.  Necrotizing enterocolitis is preceded by increased gut bacterial replication, Klebsiella, and fimbriae-encoding bacteria , 2019, Science Advances.

[7]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[8]  Finale Doshi-Velez,et al.  A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[9]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[10]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[11]  Eric Granger,et al.  Multiple instance learning: A survey of problem characteristics and applications , 2016, Pattern Recognit..

[12]  Kipp W. Johnson,et al.  Machine learning in cardiovascular medicine: are we there yet? , 2018, Heart.

[13]  Ansaf Salleb-Aouissi,et al.  QuantMiner for mining quantitative association rules , 2013, J. Mach. Learn. Res..

[14]  Intestinal dysbiosis in preterm infants preceding necrotizing enterocolitis: a systematic review and meta-analysis , 2017, Microbiome.

[15]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[16]  David L. Waltz,et al.  Discovering Characterization Rules from Rankings , 2009, 2009 International Conference on Machine Learning and Applications.

[17]  V. Mai,et al.  Low Microbial Diversity and Abnormal Microbial Succession Is Associated with Necrotizing Enterocolitis in Preterm Infants , 2017, Front. Microbiol..

[18]  J. Aitchison Reducing the dimensionality of compositional data sets , 1984 .

[19]  L. Parker,et al.  Necrotizing Enterocolitis: Have We Made Any Progress in Reducing the Risk? , 2013, Advances in neonatal care : official journal of the National Association of Neonatal Nurses.

[20]  Vera Pawlowsky-Glahn,et al.  It's all relative: analyzing microbiome data as compositions. , 2016, Annals of epidemiology.

[21]  Brian C. Thomas,et al.  Hospitalized Premature Infants Are Colonized by Related Bacterial Strains with Distinct Proteomic Profiles , 2017, mBio.

[22]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[23]  Heiko Paulheim,et al.  Feature Selection in Hierarchical Feature Spaces , 2014, Discovery Science.

[24]  R. Ehrenkranz,et al.  A Data-Driven Algorithm Integrating Clinical and Laboratory Features for the Diagnosis and Prognosis of Necrotizing Enterocolitis , 2014, PloS one.

[25]  Derrick E. Wood,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.

[26]  Jesse R. Zaneveld,et al.  Normalization and microbial differential abundance strategies depend upon data characteristics , 2017, Microbiome.

[27]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[28]  Mhd Saeed Sharif,et al.  A Proposed Machine Learning Based Collective Disease Model to Enable Predictive Diagnostics in Necrotising Enterocolitis , 2018, 2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE).

[29]  Ansaf Salleb-Aouissi,et al.  QuantMiner: A Genetic Algorithm for Mining Quantitative Association Rules , 2007, IJCAI.

[30]  Jennifer Lu,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.

[31]  Matthew C. B. Tsilimigras,et al.  Compositional data analysis of the microbiome: fundamentals, tools, and challenges. , 2016, Annals of epidemiology.

[32]  Malay Bhattacharyya,et al.  From Machine Learning to Learning Machines – A Perspective toward Personalized Medicine , 2012 .

[33]  J. Martin,et al.  Births: Final Data for 2017. , 2018, National vital statistics reports : from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

[34]  James Versalovic,et al.  The Human Microbiome and Its Potential Importance to Pediatrics , 2012, Pediatrics.

[35]  G. Weinstock,et al.  Gut bacteria dysbiosis and necrotising enterocolitis in very low birthweight infants: a prospective case-control study , 2016, The Lancet.

[36]  T. Imanishi,et al.  Rapid bacterial identification by direct PCR amplification of 16S rRNA genes using the MinION™ nanopore sequencer , 2018, bioRxiv.

[37]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[38]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .