Instability of ordination results under changes in input data order: explanations and remedies

. Correspondence analysis (CA) and its Detrended form (DCA) produced by the program CANOCO are unstable under reordering of the species and sites in the input data matrix. In CA, the main cause of the instability is the use of insufficiently stringent convergence criteria in the power algorithm used to estimate the eigenvalues. The use of stricter criteria gives results that are acceptably stable. The divisive classification program TWINSPAN uses CA based on a similar algorithm, but with extremely lax convergence criteria, and is thus susceptible to extreme instability. We detected an order-dependent programming error in the non-linear rescaling procedure that forms part of DCA. When this bug is corrected, much of the instability in DCA disappears. The stability of DCA solutions is further enhanced by the use of strict convergence criteria. In our trials, much of the instability occurred on axes 3 and 4, but one should not assume that published two-dimensional ordinations are sufficiently accurate. Data sets which have pairs of almost equal eigenvalues among the first three axes could suffer from marked instability in the first two dimensions. We recommend that a debugged, strict version of CANOCO be released. Meanwhile, users can check the stability of their CA and DCA ordinations using the software that we have made available on the World Wide Web (http://www.helsinki.fi/jhoksane/). An accurate program for CA, a debugged, strict version of DECORANA (for DCA) and a strict version of TWINSPAN are also available at our site.

[1]  C.J.F. ter Braak,et al.  A Theory of Gradient Analysis , 2004 .

[2]  Trevor Hastie,et al.  The Geometric Interpretation of Correspondence Analysis , 1987 .

[3]  Robin J. Tausch,et al.  Patterns of ordination and classification instability resulting from changes in input data order , 1995 .

[4]  M. Hill,et al.  Data analysis in community and landscape ecology , 1987 .

[5]  M. Hill,et al.  Reciprocal Averaging : an eigenvector method of ordination , 1973 .

[6]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[7]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[8]  P. Schönemann,et al.  Fitting one matrix to another under choice of a central dilation and a rigid motion , 1970 .

[9]  H. V. Groenewoud The robustness of Correspondence, Detrended Correspondence, and TWINSPAN Analysis , 1992 .

[10]  C.J.F. ter Braak,et al.  CANOCO - a FORTRAN program for canonical community ordination by [partial] [etrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis (version 2.1) , 1988 .

[11]  M. Hill,et al.  Detrended correspondence analysis: an improved ordination technique , 1980 .

[12]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[13]  C. Braak Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis , 1986 .

[14]  János Podani,et al.  On the sensitivity of ordination and classification methods to variation in the input order of data , 1997 .

[15]  Peter R. Minchin,et al.  An evaluation of the relative robustness of techniques for ecological ordination , 1987 .

[16]  Jari Oksanen,et al.  Effects of reindeer grazing on understorey vegetation in dry Pinus sylvestris forests , 1995 .

[17]  M. O. Hill,et al.  TWINSPAN: a FORTRAN program of arranging multivariate data in an ordered two way table by classification of individual and attributes , 1979 .

[18]  T. Økland,et al.  Data manipulation and gradient length estimation in DCA ordination , 1990 .

[19]  M. O. Hill,et al.  DECORANA - A FORTRAN program for detrended correspondence analysis and reciprocal averaging. , 1979 .

[20]  W. Keith Nicholson Linear Algebra with Applications , 1986 .

[21]  Jari Oksanen,et al.  Estimation of pH optima and tolerances of diatoms in lake sediments by the methods of weighted averaging, least squares and maximum likelihood, and their use for the prediction of lake acidity , 1988 .