From RNA-seq to Biological Inference: Using Compositional Data Analysis in Meta-Transcriptomics.

The proper analysis of high-throughput sequencing datasets of mixed microbial communities (meta-transcriptomics) is substantially more complex than for datasets composed of single organisms. Adapting commonly used RNA-seq methods to the analysis of meta-transcriptome datasets can be misleading and not use all the available information in a consistent manner. However, meta-transcriptomic experiments can be investigated in a principled manner using Bayesian probabilistic modeling of the data at a functional level coupled with analysis under a compositional data analysis paradigm. We present a worked example for the differential functional evaluation of mixed-species microbial communities obtained from human clinical samples that were sequenced on an Illumina platform. We demonstrate methods to functionally map reads directly, conduct a compositionally appropriate exploratory data analysis, evaluate differential relative abundance, and finally identify compositionally associated (constant ratio) functions. Using these approaches we have found that meta-transcriptomic functional analyses are highly reproducible and convey significant information regarding the ecosystem.

[1]  Raimon Tolosana-Delgado,et al.  "compositions": A unified R package to analyze compositional data , 2008, Comput. Geosci..

[2]  E Trees,et al.  Next-generation sequencing technologies and their application to the study and control of bacterial infections. , 2017, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[3]  Gregory B. Gloor,et al.  Linear association in compositional data analysis , 2018 .

[4]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[5]  Cédric Notredame,et al.  How should we measure proportionality on relative gene expression data? , 2016, Theory in Biosciences.

[6]  Javier Palarea-Albaladejo,et al.  zCompositions — R package for multivariate imputation of left-censored data under a compositional approach , 2015 .

[7]  Arkady B. Khodursky,et al.  Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Gregory B Gloor,et al.  ! 1 ! A coevolutionary barrier constrains active site variation in LAGLIDADG homing endonucleases , 2014 .

[9]  Jean M. Macklaim,et al.  Subinhibitory Antibiotic Therapy Alters Recurrent Urinary Tract Infection Pathogenesis through Modulation of Bacterial Virulence and Host Immunity , 2015, mBio.

[10]  Jean M. Macklaim,et al.  Comparative meta-RNA-seq of the vaginal microbiota and differential expression by Lactobacillus iners in health and dysbiosis , 2013, Microbiome.

[11]  Jürg Bähler,et al.  Proportionality: A Valid Alternative to Correlation for Relative Data , 2014, bioRxiv.

[12]  J. Parkinson,et al.  Metatranscriptomic analysis of diverse microbial communities reveals core metabolic pathways and microbiome-specific functionality , 2016, Microbiome.

[13]  J. Aitchison Principal component analysis of compositional data , 1983 .

[14]  Gregory B. Gloor,et al.  Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. , 2016, Canadian journal of microbiology.

[15]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[16]  Gregory B. Gloor,et al.  Displaying Variation in Large Datasets: Plotting a Visual Summary of Effect Sizes , 2016 .

[17]  Jean M. Macklaim,et al.  A multi-platform metabolomics approach identifies highly specific biomarkers of bacterial diversity in the vagina of pregnant and non-pregnant women , 2015, Scientific Reports.

[18]  Hans Bisgaard,et al.  Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies , 2016, Microbiome.

[19]  V. Pawlowsky-Glahn,et al.  Modelling and Analysis of Compositional Data: Pawlowsky-Glahn/Modelling and Analysis of Compositional Data , 2015 .

[20]  Gregory B. Gloor,et al.  Compositional uncertainty should not be ignored in high-throughput sequencing data analysis , 2016 .

[21]  Jean M. Macklaim,et al.  At the crossroads of vaginal health and disease, the genome sequence of Lactobacillus iners AB-1 , 2010, Proceedings of the National Academy of Sciences.

[22]  Luc Bijnens,et al.  A broken promise: microbiome differential abundance methods do not control the false discovery rate , 2019, Briefings Bioinform..

[23]  Christian Cole,et al.  Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment , 2015, Bioinform..

[24]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[25]  Thomas P. Quinn,et al.  Differential proportionality –a normalization-free approach to differential gene expression , 2017, bioRxiv.

[26]  E. Jaynes Probability theory : the logic of science , 2003 .

[27]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[28]  Andreas Wilke,et al.  Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG , 2011, BMC Bioinformatics.

[29]  Fangfang Xia,et al.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) , 2013, Nucleic Acids Res..

[30]  Gregory B. Gloor,et al.  The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young , 2017, mSphere.

[31]  Jean M. Macklaim,et al.  ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq , 2013, PloS one.

[32]  David R. Lovell,et al.  propr: An R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis , 2017, Scientific Reports.

[33]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[34]  Thomas P. Quinn,et al.  Understanding sequencing data as compositions: an outlook and review , 2017 .

[35]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..