Compression-based distance (CBD): a simple, rapid, and accurate method for microbiota composition comparison

BackgroundPerturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons.ResultsWe create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention.ConclusionCBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets.

[1]  H. Forssberg,et al.  Normal gut microbiota modulates brain development and behavior , 2011, Proceedings of the National Academy of Sciences.

[2]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[3]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[4]  L. Ursell,et al.  Gut Microbiomes of Malawian Twin Pairs Discordant for Kwashiorkor , 2013, Science.

[5]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[6]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[7]  D. Santoni,et al.  A gzip‐based algorithm to identify bacterial families by 16S rRNA , 2006, Letters in applied microbiology.

[8]  Anil Ananthaswamy,et al.  Faecal transplant eases symptoms of Parkinson's disease , 2011 .

[9]  N. Pace A molecular view of microbial diversity and the biosphere. , 1997, Science.

[10]  Mihai Pop,et al.  Alignment and clustering of phylogenetic markers - implications for microbial diversity studies , 2010, BMC Bioinformatics.

[11]  Yoshihide Fujiyama,et al.  Terminal restriction fragment length polymorphism analysis of the diversity of fecal microbiota in patients with ulcerative colitis , 2007, Inflammatory bowel diseases.

[12]  Wayne P. Maddison,et al.  Null models for the number of evolutionary steps in a character on a phylogenetic tree , 1991 .

[13]  L. Excoffier,et al.  Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. , 1992, Genetics.

[14]  Patrick D. Schloss,et al.  The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies , 2010, PLoS Comput. Biol..

[15]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[16]  L. Hood,et al.  Gene expression dynamics in the macrophage exhibit criticality , 2008, Proceedings of the National Academy of Sciences.

[17]  András Kocsor,et al.  Sequence analysis Application of compression-based distance measures to protein sequence classification : a methodological study , 2005 .

[18]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Andrea K. Bartram,et al.  Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads , 2011, Applied and Environmental Microbiology.

[20]  William A. Walters,et al.  Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms , 2012, The ISME Journal.

[21]  Tomas Hrncir,et al.  Gut microbiota and lipopolysaccharide content of the diet influence development of regulatory T cells: studies in germ-free mice , 2008, BMC Immunology.

[22]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[23]  J. Jansson,et al.  Changes in the Composition of the Human Fecal Microbiome After Bacteriotherapy for Recurrent Clostridium difficile-associated Diarrhea , 2009, Journal of clinical gastroenterology.

[24]  Tore Midtvedt,et al.  Alignment-Independent Comparisons of Human Gastrointestinal Tract Microbial Communities in a Multidimensional 16S rRNA Gene Evolutionary Space , 2007, Applied and Environmental Microbiology.

[25]  BMC Bioinformatics , 2005 .

[26]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[27]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[28]  M. Icaza-Chávez,et al.  Gut microbiota in health and disease , 2013 .

[29]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[30]  Marie e. Mugavin,et al.  Multidimensional Scaling: A Brief Overview , 2008, Nursing research.

[31]  Andrew P. Martin Phylogenetic Approaches for Describing and Comparing the Diversity of Microbial Communities , 2002, Applied and Environmental Microbiology.

[32]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[33]  R. Knight,et al.  UniFrac: an effective distance metric for microbial community comparison , 2011, The ISME Journal.

[34]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[35]  A. Schwiertz,et al.  Microbiota and SCFA in Lean and Overweight Healthy Subjects , 2010, Obesity.

[36]  F. Bäckhed,et al.  Obesity alters gut microbial ecology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Nicholas Chia,et al.  Robust Computational Analysis of rRNA Hypervariable Tag Datasets , 2010, PloS one.

[38]  Anne Salonen,et al.  Gastrointestinal microbiota in irritable bowel syndrome: present state and perspectives. , 2010, Microbiology.

[39]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[40]  Rob Knight,et al.  UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context , 2006, BMC Bioinformatics.

[41]  Julian Parkhill,et al.  High-throughput clone library analysis of the mucosa-associated microbiota reveals dysbiosis and differences between inflamed and non-inflamed regions of the intestine in inflammatory bowel disease , 2011, BMC Microbiology.

[42]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[43]  L. Brandt,et al.  Treatment of Refractory/Recurrent C. difficile-associated Disease by Donated Stool Transplanted Via Colonoscopy: A Case Series of 12 Patients , 2010, Journal of clinical gastroenterology.

[44]  Stephen L. Rathbun,et al.  Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples , 2001, Applied and Environmental Microbiology.

[45]  Yu-jing Fan,et al.  Intestinal microecology and quality of life in irritable bowel syndrome patients. , 2004, World journal of gastroenterology.

[46]  H. Flint,et al.  Human colonic microbiota associated with diet, obesity and weight loss , 2008, International Journal of Obesity.

[47]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[48]  D. Savage Microbial ecology of the gastrointestinal tract. , 1977, Annual review of microbiology.

[49]  Jeremy K. Nicholson,et al.  Gut microbiota: a potential new territory for drug targeting , 2008, Nature Reviews Drug Discovery.

[50]  C. Manichanh,et al.  Reduced diversity of faecal microbiota in Crohn’s disease revealed by a metagenomic approach , 2005, Gut.

[51]  R. Knight,et al.  The Effect of Diet on the Human Gut Microbiome: A Metagenomic Analysis in Humanized Gnotobiotic Mice , 2009, Science Translational Medicine.

[52]  Jo Handelsman,et al.  Integration of Microbial Ecology and Statistics: a Test To Compare Gene Libraries , 2004, Applied and Environmental Microbiology.