Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis

BackgroundExperimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible.ResultsData from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples.ConclusionStatistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a similar approach.

[1]  P. Manzoni,et al.  Routine Lactobacillus rhamnosus GG administration in VLBW infants: a retrospective, 6-year cohort study. , 2011, Early human development.

[2]  G. Reid,et al.  African traditional fermented foods and probiotics. , 2009, Journal of medicinal food.

[3]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[4]  R. Knight,et al.  Diversity, stability and resilience of the human gut microbiota , 2012, Nature.

[5]  R. Kort,et al.  Probiotics for every body. , 2012, Trends in biotechnology.

[6]  S. Lahtinen,et al.  Binding of aflatoxin B1 to cell wall components of Lactobacillus rhamnosus strain GG , 2004, Food additives and contaminants.

[7]  M. Kalliomäki,et al.  Evaluation of diet and growth in children with and without atopic eczema: follow-up study from birth to 4 years , 2005, British Journal of Nutrition.

[8]  Thomas J. Hardcastle,et al.  Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution , 2013, BMC Bioinformatics.

[9]  Maya R. Gupta,et al.  Introduction to the Dirichlet Distribution and Related Processes , 2010 .

[10]  M. Wagner,et al.  Advantages and limitations of quantitative PCR (Q-PCR)-based approaches in microbial ecology. , 2009, FEMS microbiology ecology.

[11]  T. Hwa,et al.  Interdependence of Cell Growth and Gene Expression: Origins and Consequences , 2010, Science.

[12]  V. Vadivel,et al.  Development, Acceptability, and Nutritional Characteristics of a Low-Cost, Shelf-Stable Supplementary Food Product for Vulnerable Groups in Kenya , 2012, Food and nutrition bulletin.

[13]  Prashanth Setty,et al.  The cytotoxic activity of the total alkaloids isolated from different parts of Solanum pseudocapsicum. , 2004, Biological & pharmaceutical bulletin.

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[15]  Zhongtang Yu,et al.  Intestinal microbiome of poultry and its interaction with host and diet , 2014, Gut microbes.

[16]  Stan J. J. Brouns,et al.  Comparative Genomic and Functional Analysis of 100 Lactobacillus rhamnosus Strains and Their Comparison with Strain GG , 2013, PLoS genetics.

[17]  E. Stackebrandt,et al.  Bacterial Diversity in the Haloalkaline Lake Elmenteita, Kenya , 2010, Current Microbiology.

[18]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[19]  Statistical analysis of wines using a robust compositional biplot. , 2012, Talanta.

[20]  S. Massart,et al.  Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa , 2010, Proceedings of the National Academy of Sciences.

[21]  P. Dixon VEGAN, a package of R functions for community ecology , 2003 .

[22]  R. Mändar Microbiota of male genital tract: impact on the health of man and his partner. , 2013, Pharmacological research.

[23]  J. Habbema,et al.  Probiotic Yogurt Consumption is Associated With an Increase of CD4 Count Among People Living With HIV/AIDS , 2010, Journal of clinical gastroenterology.

[24]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[25]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[26]  Davis J. McCarthy,et al.  Count-based differential expression analysis of RNA sequencing data using R and Bioconductor , 2013, Nature Protocols.

[27]  M. Jakobsen,et al.  Microbiological characterization and probiotic potential of koko and koko sour water, African spontaneously fermented millet porridge and drink , 2004, Journal of applied microbiology.

[28]  Bernard C. K. Choi,et al.  Can scientists and policy makers work together? , 2005, Journal of Epidemiology and Community Health.

[29]  L. AuerPaul,et al.  A Two-Stage Poisson Model for Testing RNA-Seq Data , 2011 .

[30]  R. Knight,et al.  Microbiota restoration: natural and supplemented recovery of human microbial communities , 2011, Nature Reviews Microbiology.

[31]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[32]  Michael Muller,et al.  Consensus statement understanding health and malnutrition through a systems approach: the ENOUGH program for early life , 2013, Genes & Nutrition.

[33]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[34]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[35]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[36]  Y. Cheung,et al.  Effect of complementary feeding with lipid-based nutrient supplements and corn-soy blend on the incidence of stunting and linear growth among 6- to 18-month-old infants and children in rural Malawi. , 2015, Maternal & child nutrition.

[37]  Todd R Klaenhammer,et al.  Probiotics, prebiotics, and the host microbiome: the science of translation , 2013, Annals of the New York Academy of Sciences.

[38]  N. Hansen,et al.  Long-term Colonization of a Lactobacillus plantarum Synbiotic Preparation in the Neonatal Gut , 2008, Journal of pediatric gastroenterology and nutrition.

[39]  K. Pearson Mathematical contributions to the theory of evolution.—On a form of spurious correlation which may arise when indices are used in the measurement of organs , 1897, Proceedings of the Royal Society of London.

[40]  G. Olsen,et al.  Differences between the normal vaginal bacterial community of baboons and that of humans , 2011, American journal of primatology.

[41]  L. Ursell,et al.  Gut Microbiomes of Malawian Twin Pairs Discordant for Kwashiorkor , 2013, Science.

[42]  J. Habbema,et al.  Effect of Micronutrient and Probiotic Fortified Yogurt on Immune-Function of Anti-Retroviral Therapy Naive HIV Patients , 2011, Nutrients.

[43]  Y. Ohashi,et al.  Habitual Intake of Lactic Acid Bacteria and Risk Reduction of Bladder Cancer , 2002, Urologia Internationalis.

[44]  Gregory B. Gloor,et al.  Deep Sequencing of the Vaginal Microbiota of Women with HIV , 2010, PloS one.

[45]  C. Quince,et al.  Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics , 2012, PloS one.

[46]  Raimon Tolosana-Delgado,et al.  "compositions": A unified R package to analyze compositional data , 2008, Comput. Geosci..

[47]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[48]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[49]  G. Reid,et al.  Development of a locally sustainable functional food for people living with HIV in Sub-Saharan Africa: laboratory testing and sensory evaluation. , 2011, Beneficial microbes.

[50]  Björn A. Malmgren,et al.  Logratio transformation of compositional data: a resolution of the constant sum constraint , 1998 .

[51]  James O. Berger,et al.  Ordered group reference priors with application to the multinomial problem , 1992 .

[52]  William A. Walters,et al.  Experimental and analytical tools for studying the human microbiome , 2011, Nature Reviews Genetics.

[53]  S. Patole,et al.  Progress in the field of probiotics: year 2011 , 2011, Current opinion in gastroenterology.

[54]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[55]  S. Salminen,et al.  Physicochemical alterations enhance the ability of dairy strains of lactic acid bacteria to remove aflatoxin from contaminated media. , 1998, Journal of food protection.

[56]  G. Reid,et al.  The Role of the Microbiome in Rheumatic Diseases , 2013, Current Rheumatology Reports.

[57]  P. Filzmoser,et al.  Univariate Statistical Analysis of Environmental (compositional) Data: Problems and Possibilities , 2009 .

[58]  Jean M. Macklaim,et al.  ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq , 2013, PloS one.

[59]  Muhammad Yunus,et al.  Creating a World Without Poverty: Social Business and the Future of Capitalism , 2007 .

[60]  L. Allen,et al.  Considerations in developing lipid-based nutrient supplements for prevention of undernutrition: experience from the International Lipid-Based Nutrient Supplements (iLiNS) Project. , 2015, Maternal & child nutrition.

[61]  David R. Lovell,et al.  Proportions, Percentages, PPM: Do the Molecular Biosciences Treat Compositional Data Right? , 2011 .

[62]  C. K. Prahalad,et al.  The Fortune at the Bottom of the Pyramid , 2004 .

[63]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[64]  J. Habbema,et al.  Lactobacillus rhamnosus GR‐1 and L. reuteri RC‐14 to prevent or cure bacterial vaginosis among women with HIV , 2010, International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics.

[65]  P. V. van Helden,et al.  One world, one health , 2013, EMBO reports.

[66]  G. Guyatt,et al.  Probiotics for the prevention of Clostridium difficile-associated diarrhea in adults and children. , 2013, The Cochrane database of systematic reviews.

[67]  Daniel Bottomly,et al.  Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays , 2011, PloS one.

[68]  W. Holzapfel,et al.  Functional characteristics of Lactobacillus spp. from traditional Maasai fermented milk products in Kenya. , 2008, International journal of food microbiology.

[69]  G. Berglund,et al.  Fat From Different Foods Show Diverging Relations With Breast Cancer Risk in Postmenopausal Women , 2005, Nutrition and cancer.

[70]  V. Pawlowsky-Glahn,et al.  Groups of Parts and Their Balances in Compositional Data Analysis , 2005 .

[71]  Alyssa C. Frazee,et al.  ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets , 2011, BMC Bioinformatics.

[72]  Y. Bao,et al.  The footprints of gut microbial-mammalian co-metabolism. , 2011, Journal of proteome research.

[73]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[74]  S. Larsson,et al.  Cultured milk, yogurt, and dairy intake in relation to bladder cancer risk in a prospective study of Swedish women and men. , 2008, The American journal of clinical nutrition.

[75]  A. Linnemann,et al.  Development of a locally sustainable functional food based on mutandabota, a traditional food in southern Africa. , 2014, Journal of dairy science.

[76]  C. Brand,et al.  Coprophagy in animals: a review. , 1991, The Cornell veterinarian.

[77]  S. Salminen,et al.  Distinct Gut Microbiota in Southeastern African and Northern European Infants , 2012, Journal of pediatric gastroenterology and nutrition.

[78]  L. Masson,et al.  Aflatoxin, Fumonisin and Shiga Toxin-Producing Escherichia coli Infections in Calves and the Effectiveness of Celmanax®/Dairyman’s Choice™ Applications to Eliminate Morbidity and Mortality Losses , 2013, Toxins.

[79]  B. Efron Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods , 1981 .

[80]  I. Rowland,et al.  Influence of carcinogen binding by lactic acid-producing bacteria on tissue distribution and in vivo mutagenicity of dietary carcinogens. , 1997, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[81]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology High-Dimensional Regression and Variable Selection Using CAR Scores , 2011 .

[82]  G. Reid,et al.  Probiotics for the developing world. , 2005, Journal of clinical gastroenterology.

[83]  Rob Knight,et al.  Insights from Characterizing Extinct Human Gut Microbiomes , 2012, PloS one.

[84]  J. Mrukowicz,et al.  Efficacy of Lactobacillus GG in prevention of nosocomial diarrhea in infants. , 2001, The Journal of pediatrics.

[85]  S. Collins,et al.  Probiotics and prebiotics for severe acute malnutrition (PRONUT study): a double-blind efficacy randomised controlled trial in Malawi , 2009, The Lancet.

[86]  Gregory B Gloor,et al.  ! 1 ! A coevolutionary barrier constrains active site variation in LAGLIDADG homing endonucleases , 2014 .

[87]  David J. Edwards,et al.  Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data , 2012, PloS one.

[88]  Douglas G. Altman,et al.  Measurement in Medicine: The Analysis of Method Comparison Studies , 1983 .

[89]  Curtis Huttenhower,et al.  Microbial Co-occurrence Relationships in the Human Microbiome , 2012, PLoS Comput. Biol..

[90]  S. Salminen,et al.  Influence of mother's intestinal microbiota on gut colonization in the infant , 2011, Gut microbes.

[91]  Jonathan R. Brestoff,et al.  Commensal bacteria at the interface of host metabolism and the immune system , 2013, Nature Immunology.

[92]  E. Kang’ethe,et al.  Aflatoxin B1 and M1 contamination of animal feeds and milk from urban centers in Kenya. , 2009, African health sciences.

[93]  S. Salminen,et al.  Probiotics for optimal nutrition: from efficacy to guidelines. , 2012, Advances in nutrition.

[94]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[95]  Seon-Woo Lee,et al.  Bioprospecting Potential of the Soil Metagenome: Novel Enzymes and Bioactivities , 2013, Genomics & informatics.

[96]  N. Hajeebhoy,et al.  Tailoring Communication Strategies to Improve Infant and Young Child Feeding Practices in Different Country Settings , 2013, Food and nutrition bulletin.

[97]  G. Gloor,et al.  High throughput sequencing methods and analysis for microbiome research. , 2013, Journal of microbiological methods.

[98]  Ying Wang,et al.  Dynamic Gut Microbiome across Life History of the Malaria Mosquito Anopheles gambiae in Kenya , 2011, PloS one.

[99]  Levi Waldron,et al.  Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples , 2012, Genome Biology.

[100]  Chandler Zuo,et al.  A statistical framework for power calculations in ChIP-seq experiments , 2014, Bioinform..

[101]  Jean M. Macklaim,et al.  Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products , 2010, PLoS ONE.

[102]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[103]  J. Utzinger,et al.  The effects of iron fortification on the gut microbiota in African children: a randomized controlled trial in Cote d'Ivoire. , 2010, The American journal of clinical nutrition.

[104]  M. Mwaniki,et al.  Maternal and early onset neonatal bacterial sepsis: burden and strategies for prevention in sub-Saharan Africa. , 2009, The Lancet. Infectious diseases.

[105]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[106]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .