Uncovering the Horseshoe Effect in Microbial Analyses

The horseshoe effect is often considered an artifact of dimensionality reduction. We show that this is not true in the case for microbiome data and that, in fact, horseshoes can help analysts discover microbial niches across environments. ABSTRACT The horseshoe effect is a phenomenon that has long intrigued ecologists. The effect was commonly thought to be an artifact of dimensionality reduction, and multiple techniques were developed to unravel this phenomenon and simplify interpretation. Here, we provide evidence that horseshoes arise as a consequence of distance metrics that saturate—a familiar concept in other fields but new to microbial ecology. This saturation property loses information about community dissimilarity, simply because it cannot discriminate between samples that do not share any common features. The phenomenon illuminates niche differentiation in microbial communities and indicates species turnover along environmental gradients. Here we propose a rationale for the observed horseshoe effect from multiple dimensionality reduction techniques applied to simulations, soil samples, and samples from postmortem mice. An in-depth understanding of this phenomenon allows targeting of niche differentiation patterns from high-level ordination plots, which can guide conventional statistical tools to pinpoint microbial niches along environmental gradients. IMPORTANCE The horseshoe effect is often considered an artifact of dimensionality reduction. We show that this is not true in the case for microbiome data and that, in fact, horseshoes can help analysts discover microbial niches across environments.

[1]  M. Gouy,et al.  Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony. , 1997, Molecular biology and evolution.

[2]  M. Hill,et al.  Detrended correspondence analysis: an improved ordination technique , 1980 .

[3]  Sergio Camiz,et al.  The Guttman effect: its interpretation and a new redressing method , 2004 .

[4]  R. Ejrnæs,et al.  Can we trust gradients extracted by Detrended Correspondence Analysis , 2000 .

[5]  János Podani,et al.  RESEMBLANCE COEFFICIENTS AND THE HORSESHOE EFFECT IN PRINCIPAL COORDINATES ANALYSIS , 2002 .

[6]  J. M. Smith,et al.  Synonymous nucleotide divergence: what is "saturation"? , 1996, Genetics.

[7]  Sharad Goel,et al.  HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS , 2008, 0811.1477.

[8]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[10]  R. Knight,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[11]  M. Hill,et al.  Detrended correspondence analysis: An improved ordination technique , 2004, Vegetatio.

[12]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[13]  R. Knight,et al.  Pyrosequencing-Based Assessment of Soil pH as a Predictor of Soil Bacterial Community Structure at the Continental Scale , 2009, Applied and Environmental Microbiology.

[14]  Matthew J. Gebert,et al.  Microbial community assembly and metabolic function during mammalian corpse decomposition , 2016, Science.