Clustering of Microbiome Data: Evaluation of Ensemble Design Approaches

The research focus on the human microbiome is moving towards uncovering its association with the overall wellbeing and using this knowledge in personalized medicine and connected health. Driven by more affordable highthroughput sequencing, microbiome data generation rate has increased, enabling an efficient implementation of data-driven algorithms. This study evaluates the possibilities to identify clusters in a human microbiome data based on taxonomic profiles, relying on 24 different $\beta $diversity measures, individual and ensemble clustering approaches. The influence of ensemble creation techniques and parameter selection to the robustness and quality of consensus partition was explored. Furthermore, we have evaluated changes in the clustering performance after dimensionality reduction. The results indicate that careful selection of the algorithm parameters and ensemble design are needed to ensure the stable consensus partition. Reduction in the number of input features using kernel principal component analysis is accompanied with loss of discrimination potential.

[1]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[2]  Rob Knight,et al.  Current understanding of the human microbiome , 2018, Nature Medicine.

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Vladimir S. Crnojevic,et al.  Ensemble Approaches for Stable Assessment of Clusters in Microbiome Samples , 2016, CIBB.

[5]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[6]  William A. Walters,et al.  Using QIIME to Analyze 16S rRNA Gene Sequences from Microbial Communities , 2012, Current protocols in microbiology.

[7]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[8]  Andreas Wilke,et al.  A RESTful API for Accessing Microbial Community Data for MG-RAST , 2015, PLoS Comput. Biol..

[9]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[10]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Peng Yang,et al.  Microbial community pattern detection in human body habitats via ensemble clustering framework , 2014, BMC Systems Biology.

[13]  R. Knight,et al.  Moving pictures of the human microbiome , 2011, Genome Biology.

[14]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[15]  L. Hubert,et al.  Comparing partitions , 1985 .

[16]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[17]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[18]  Pierre Legendre,et al.  Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. , 2013, Ecology letters.