Nonlinear dimensionality reduction: Alternative ordination approaches for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data

Abstract Ecological patterns are difficult to extract directly from vegetation data. The respective surveys provide a high number of interrelated species occurrence variables. Since often only a limited number of ecological gradients determine species distributions, the data might be represented by much fewer but effectively independent variables. This can be achieved by reducing the dimensionality of the data. Conventional methods are either limited to linear feature extraction (e.g., principal component analysis, and Classical Multidimensional Scaling, CMDS) or require a priori assumptions on the intrinsic data dimensionality (e.g., Nonmetric Multidimensional Scaling, NMDS, and self organized maps, SOM). In this study we explored the potential of Isometric Feature Mapping (Isomap). This new method of dimensionality reduction is a nonlinear generalization of CMDS. Isomap is based on a nonlinear geodesic inter-point distance matrix. Estimating geodesic distances requires one free threshold parameter, which defines linear geometry to be preserved in the global nonlinear distance structure. We compared Isomap to its linear (CMDS) and nonmetric (NMDS) equivalents. Furthermore, the use of geodesic distances allowed also extending NMDS to a version that we called NMDS-G. In addition we investigated a supervised Isomap variant (S-Isomap) and showed that all these techniques are interpretable within a single methodical framework. As an example we investigated seven plots (subdivided in 456 subplots) in different secondary tropical montane forests with 773 species of vascular plants. A key problem for the study of tropical vegetation data is the heterogeneous small scale variability implying large ranges of β -diversity. The CMDS and NMDS methods did not reduce the data dimensionality reasonably. On the contrary, Isomap explained 95% of the data variance in the first five dimensions and provided ecologically interpretable visualizations; NMDS-G yielded similar results. The main shortcoming of the latter was the high computational cost and the requirement to predefine the dimension of the embedding space. The S-Isomap learning scheme did not improve the Isomap variant for an optimal threshold parameter but substantially improved the nonoptimal solutions. We conclude that Isomap as a new ordination method allows effective representations of high dimensional vegetation data sets. The method is promising since it does not require a priori assumptions, and is computationally highly effective.

[1]  Ming-Hsuan Yang,et al.  Extended isomap for pattern classification , 2002, AAAI/IAAI.

[2]  Sovan Lek,et al.  A comparison of self-organizing map algorithm and some conventional statistical methods for ecological community ordination , 2001 .

[3]  Joshua B. Tenenbaum,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[4]  Hongyuan Zha,et al.  Isometric Embedding and Continuum ISOMAP , 2003, ICML.

[5]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[6]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Hill,et al.  Data analysis in community and landscape ecology , 1987 .

[8]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[9]  Renato Valencia,et al.  Libro rojo de las plantas endémicas del Ecuador , 2000 .

[10]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[11]  Michel Loreau,et al.  Biodiversity Science Evolves , 2005, Science.

[12]  M. Richter,et al.  Seasonality of weather and tree phenology in a tropical evergreen mountain rain forest , 2006, International journal of biometeorology.

[13]  Zhi-Hua Zhou,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[15]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[16]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[17]  R. S. Cowan,et al.  Flora of Ecuador , 1973 .

[18]  Wolfgang Wilcke,et al.  Nutrient storage and turnover in organic layers under tropical montane rain forest in Ecuador , 2002 .

[19]  Kap Luk Chan,et al.  An extended Isomap algorithm for learning multi-class manifold , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[20]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[21]  Peter R. Minchin,et al.  An evaluation of the relative robustness of techniques for ecological ordination , 1987 .

[22]  Michael Kirby,et al.  Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns , 2000 .

[23]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[24]  Robert H. Whittaker,et al.  Evaluation of Ordination Techniques , 1978 .

[25]  Peter R. Minchin,et al.  An evaluation of the relative robustness of techniques for ecological ordination , 1987, Vegetatio.

[26]  N. Brummitt,et al.  Biodiversity: Where's Hot and Where's Not , 2003 .

[27]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[28]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[29]  Ronald Stoyan,et al.  Soil properties on a chronosequence of landslides in montane rain forest, Ecuador , 2003 .

[30]  G. De’ath PRINCIPAL CURVES: A NEW TECHNIQUE FOR INDIRECT AND DIRECT GRADIENT ANALYSIS , 1999 .

[31]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[32]  E. Aronson,et al.  Theory and method , 1985 .

[33]  A. Gámez,et al.  Nonlinear dimensionality reduction in climate data , 2004 .

[34]  Robert H. Whittaker,et al.  Ordination of Plant Communities , 1978, Handbook of Vegetation Science.

[35]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[36]  Maja J. Mataric,et al.  A spatio-temporal extension to Isomap nonlinear dimension reduction , 2004, ICML.

[37]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[38]  Lawrence K. Saul,et al.  Analysis and extension of spectral methods for nonlinear dimensionality reduction , 2005, ICML.