Machine learning methods for microbiome studies

Researches on the microbiome have been actively conducted worldwide and the results have shown human gut bacterial environment significantly impacts on immune system, psychological conditions, cancers, obesity, and metabolic diseases. Thanks to the development of sequencing technology, microbiome studies with large number of samples are eligible on an acceptable cost nowadays. Large samples allow analysis of more sophisticated modeling using machine learning approaches to study relationships between microbiome and various traits. This article provides an overview of machine learning methods for non-data scientists interested in the association analysis of microbiomes and host phenotypes. Once genomic feature of microbiome is determined, various analysis methods can be used to explore the relationship between microbiome and host phenotypes that include penalized regression, support vector machine (SVM), random forest, and artificial neural network (ANN). Deep neural network methods are also touched. Analysis procedure from environment setup to extract analysis results are presented with Python programming language.

[1]  Jens Roat Kultima,et al.  Potential of fecal microbiota for early‐stage detection of colorectal cancer , 2014 .

[2]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[3]  X. Xu,et al.  Association Between Gut Microbiota and Autism Spectrum Disorder: A Systematic Review and Meta-Analysis , 2019, Front. Psychiatry.

[4]  Nathan H. Lents,et al.  A Machine Learning Approach for Using the Postmortem Skin Microbiome to Estimate the Postmortem Interval , 2016, PloS one.

[5]  Jesse R. Zaneveld,et al.  Normalization and microbial differential abundance strategies depend upon data characteristics , 2017, Microbiome.

[6]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[7]  T. Spector,et al.  Interplay between the human gut microbiome and host metabolism , 2019, Nature Communications.

[8]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[9]  Horst Zitzelsberger,et al.  Tumor classification of six common cancer types based on proteomic profiling by MALDI imaging. , 2012, Journal of proteome research.

[10]  F. Ryan Application of machine learning techniques for creating urban microbial fingerprints , 2019, Biology Direct.

[11]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[12]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[13]  J. Ashkani,et al.  Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes , 2016, Scientific Reports.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Alice C. McHardy,et al.  MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples , 2018, Bioinformatics.

[16]  Y. Ohashi,et al.  Taxonomic classification for microbiome analysis, which correlates well with the metabolite milieu of the gut , 2018, BMC Microbiology.

[17]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[18]  H. Becher,et al.  Clustering of Subgingival Microbiota Reveals Microbial Disease Ecotypes Associated with Clinical Stages of Periodontitis in a Cross-Sectional Study , 2017, Front. Microbiol..

[19]  Rafael A. Irizarry,et al.  Meta-analysis of gut microbiome studies identifies disease-specific and shared responses , 2017, Nature Communications.

[20]  J. Caporaso,et al.  Long-term benefit of Microbiota Transfer Therapy on autism symptoms and gut microbiota , 2019, Scientific Reports.

[21]  W. Garrett,et al.  Gut microbiota, metabolites and host immunity , 2016, Nature Reviews Immunology.

[22]  Zhuye Jie,et al.  Human Gut Microbiota Changes Reveal the Progression of Glucose Intolerance , 2013, PloS one.

[23]  G. Hold,et al.  Autism Spectrum Disorder and the Gut Microbiota in Children: A Systematic Review , 2020, Annals of Nutrition and Metabolism.

[24]  Liping Zhao,et al.  The gut microbiota, obesity and insulin resistance. , 2013, Molecular aspects of medicine.

[25]  S. Horvath,et al.  Global histone modification patterns predict risk of prostate cancer recurrence , 2005, Nature.

[26]  Y. Nam,et al.  Progress of analytical tools and techniques for human gut microbiome research , 2018, Journal of Microbiology.

[27]  Sang-Uk Seo,et al.  Role of the gut microbiota in immunity and inflammatory disease , 2013, Nature Reviews Immunology.

[28]  Evgeny Putin,et al.  Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects , 2018, bioRxiv.

[29]  J. Raes,et al.  The neuroactive potential of the human gut microbiota in quality of life and depression , 2019, Nature Microbiology.

[30]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[31]  A. Baghestani,et al.  How to control confounding effects by statistical analysis , 2012, Gastroenterology and hepatology from bed to bench.

[32]  Patrice D Cani,et al.  Diabetes, obesity and gut microbiota. , 2013, Best practice & research. Clinical gastroenterology.

[33]  Young-Mo Kim,et al.  Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice , 2019, Cell.

[34]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[35]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[36]  K. Alviña,et al.  The role of inflammation and the gut microbiome in depression and anxiety , 2019, Journal of neuroscience research.

[37]  J. Segre,et al.  The human microbiome: our second genome. , 2012, Annual review of genomics and human genetics.

[38]  R. Morrison,et al.  Study design, precision, and validity in observational studies. , 2009, Journal of palliative medicine.

[39]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[40]  P. Bork,et al.  Richness of human gut microbiome correlates with metabolic markers , 2013, Nature.

[41]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[42]  Anne-Laure Boulesteix,et al.  AUC-RF: A New Strategy for Genomic Profiling with Random Forest , 2011, Human Heredity.

[43]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[44]  Radu Marculescu,et al.  MetaNN: Accurate Classification of Host Phenotypes From Metagenomic Data Using Neural Networks , 2018, BCB.

[45]  Soumen Roy,et al.  Microbiota: a key orchestrator of cancer therapy , 2017, Nature Reviews Cancer.

[46]  Kumardeep Chaudhary,et al.  Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer , 2017, Scientific Reports.

[47]  J. Dushoff,et al.  Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization , 2012, PloS one.

[48]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[49]  Taesung Park,et al.  Molecular subtypes of pancreatic cancer based on miRNA expression profiles have independent prognostic value , 2016, Journal of gastroenterology and hepatology.

[50]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[51]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[52]  R. Lorenz,et al.  Gut Microbiota and Obesity , 2012, Current Obesity Reports.

[53]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[54]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[55]  R. Knight,et al.  Quantitative and Qualitative β Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities , 2007, Applied and Environmental Microbiology.

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  Michael W Taylor,et al.  Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges , 2011, The ISME Journal.

[58]  Michael Zimmermann,et al.  Separating host and microbiome contributions to drug pharmacokinetics and toxicity , 2019, Science.

[59]  Brian Munsky,et al.  Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition , 2019 .