Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

Background Cloud computing for microbiome data sets can significantly increase working efficiencies and expedite the translation of research findings into clinical practice. The Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis. Objective The goals of this study were to develop a microbiome data analysis pipeline by using AWS cloud and to conduct a proof-of-concept test for microbiome data storage, processing, and analysis. Methods A multidisciplinary team was formed to develop and test a reproducible microbiome data analysis pipeline with multiple AWS cloud services that could be used for storage, computation, and data analysis. The microbiome data analysis pipeline developed in AWS was tested by using two data sets: 19 vaginal microbiome samples and 50 gut microbiome samples. Results Using AWS features, we developed a microbiome data analysis pipeline that included Amazon Simple Storage Service for microbiome sequence storage, Linux Elastic Compute Cloud (EC2) instances (ie, servers) for data computation and analysis, and security keys to create and manage the use of encryption for the pipeline. Bioinformatics and statistical tools (ie, Quantitative Insights Into Microbial Ecology 2 and RStudio) were installed within the Linux EC2 instances to run microbiome statistical analysis. The microbiome data analysis pipeline was performed through command-line interfaces within the Linux operating system or in the Mac operating system. Using this new pipeline, we were able to successfully process and analyze 50 gut microbiome samples within 4 hours at a very low cost (a c4.4xlarge EC2 instance costs $0.80 per hour). Gut microbiome findings regarding diversity, taxonomy, and abundance analyses were easily shared within our research team. Conclusions Building a microbiome data analysis pipeline with AWS cloud is feasible. This pipeline is highly reliable, computationally powerful, and cost effective. Our AWS-based microbiome analysis pipeline provides an efficient tool to conduct microbiome data analysis.

[1]  F. Bäckhed,et al.  Signals from the gut microbiota to distant organs in physiology and disease , 2016, Nature Medicine.

[2]  R. Frye,et al.  The Significance of the Enteric Microbiome on the Development of Childhood Disease: A Review of Prebiotic and Probiotic Therapies in Disorders of Childhood , 2016, Clinical medicine insights. Pediatrics.

[3]  Mattia D'Antonio,et al.  RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application , 2015, BMC Genomics.

[4]  Katherine H. Huang,et al.  The Human Microbiome Project: A Community Resource for the Healthy Human Microbiome , 2012, PLoS biology.

[5]  Thomas J. Grabowski,et al.  Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost? , 2017, Front. Neuroinform..

[6]  A. Clooney,et al.  A clinician's guide to microbiome analysis , 2017, Nature Reviews Gastroenterology &Hepatology.

[7]  Microbiota meet big data. , 2014, Nature chemical biology.

[8]  Rob Knight,et al.  Advancing our understanding of the human microbiome using QIIME. , 2013, Methods in enzymology.

[9]  Xinyu Zhang,et al.  CloudNeo: a cloud pipeline for identifying patient-specific tumor neoantigens , 2017, Bioinform..

[10]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[11]  Jose A Navas-Molina,et al.  The Microbiome and Big Data. , 2017, Current opinion in systems biology.

[12]  Aileen I. Pogue,et al.  Pathogenic microbes, the microbiome, and Alzheimer’s disease (AD) , 2014, Front. Aging Neurosci..

[13]  Hyun-Hwan Jeong,et al.  CRISPRcloud: a secure cloud‐based pipeline for CRISPR pooled screen deconvolution , 2017, Bioinform..

[14]  Timothy L. Tickle,et al.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment , 2012, Genome Biology.

[15]  Victor S. Pylro,et al.  Book Review: Follow Your Gut: The Enormous Impact of Tiny Microbes , 2016, Front. Microbiol..

[16]  Ron Milo,et al.  Are We Really Vastly Outnumbered? Revisiting the Ratio of Bacterial to Host Cells in Humans , 2016, Cell.

[17]  Yinglin Xia,et al.  Hypothesis testing and statistical analysis of microbiome , 2017, Genes & diseases.

[18]  M. Blaser,et al.  The human microbiome: at the interface of health and disease , 2012, Nature Reviews Genetics.

[19]  Jiang Bian,et al.  Big data hurdles in precision medicine and precision public health , 2018, BMC Medical Informatics and Decision Making.

[20]  Rob Knight,et al.  Defining the human microbiome. , 2012, Nutrition reviews.

[21]  J. Venter,et al.  The Human Microbiome and Cancer , 2017, Cancer Prevention Research.

[22]  Wei Zhang,et al.  GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service , 2017, BMC Genomics.

[23]  Curtis Huttenhower,et al.  Chapter 12: Human Microbiome Analysis , 2012, PLoS Comput. Biol..

[24]  J. Bai,et al.  Pilot Study of Vaginal Microbiome Using QIIME 2™ in Women With Gynecologic Cancer Before and After Radiation Therapy. , 2019, Oncology nursing forum.

[25]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[26]  Nigam H Shah,et al.  Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis , 2016, Journal of medical Internet research.

[27]  D. Bruner,et al.  Composition of gut microbiota and its association with body mass index and lifestyle factors in a cohort of 7–18 years old children from the American Gut Project , 2018, Pediatric obesity.

[28]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[29]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[30]  S. Lynch,et al.  The Human Intestinal Microbiome in Health and Disease. , 2016, The New England journal of medicine.

[31]  Ruth Ley,et al.  Unravelling the effects of the environment and host genotype on the gut microbiome , 2011, Nature Reviews Microbiology.

[32]  Meghan Coakley McCarthy,et al.  Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis , 2018, Bioinform..

[33]  J. Petrosino,et al.  Microbiota Modulate Behavioral and Physiological Abnormalities Associated with Neurodevelopmental Disorders , 2013, Cell.