RCLUS, A NEW PROGRAM FOR CLUSTERING ASSOCIATED SPECIES: A DEMONSTRATION USING A MOJAVE DESERT PLANT COMMUNITY DATASET

Abstract This paper presents a new clustering program named RCLUS that was developed for species (R-mode) analysis of plant community data. RCLUS identifies clusters of co-occurring species that meet a user-specified cutoff level of positive association with each other. The “strict affinity” clustering algorithm in RCLUS builds clusters of species whose pairwise associations all exceed the cutoff level, whereas the “coalition” clustering algorithm only requires that the mean pairwise association of the cluster exceeds the cutoff level. Both algorithms allow species to belong to multiple clusters, thus accommodating both generalist and specialist species. Using a 60-plot dataset of perennial plants occurring on the Beaver Dam Slope in southwestern Utah, we carried out RCLUS analyses and compared the results with 2 widely used clustering techniques: UPGMA and PAM. We found that many of the RCLUS clusters were subsets of the UPGMA and PAM clusters, although novel species combinations were also generated by RCLUS. An advantage of RCLUS over these methods is its ability to exclude species that are poorly represented in a dataset as well as species lacking strong association patterns. The RCLUS program also includes modules that assess the affinity of a given species, plot, or environmental variable to a given cluster. We found statistically significant correlations between some of the RCLUS species clusters and certain environmental variables of the study area (elevation and topographical position). We also noted differences in clustering behavior when different association coefficients were used in RCLUS and found that those incorporating joint absences (e.g., the phi coefficient) produced more clusters and more even numbers of species per cluster than those not incorporating joint absences (e.g., the Jaccard index). In addition to the species association application described in this paper, the RCLUS algorithms could be used for preliminary data stratification in sample (Q-mode) analysis. The indirect link between sample plots and RCLUS species clusters could also be exploited to yield a form of “fuzzy” classification of plots or to characterize species pools of plots.

[1]  Daniel P. Faith,et al.  Compositional dissimilarity as a robust measure of ecological distance , 1987, Vegetatio.

[2]  J. Grace Difficulties with estimating and interpreting species pools and the implications for understanding patterns of diversity , 2001, Folia Geobotanica.

[3]  P. Legendre,et al.  SPECIES ASSEMBLAGES AND INDICATOR SPECIES:THE NEED FOR A FLEXIBLE ASYMMETRICAL APPROACH , 1997 .

[4]  R. Whittaker,et al.  Hierarchical Classification of Community Data , 1981 .

[5]  Alan G. Hawkes,et al.  A handbook of numerical and statistical techniques , 1977 .

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  R. E. Eckert,et al.  Big sagebrush ( Artemisia tridentata vaseyana ) and longleaf snowberry ( Symphoricarpos oreophilus ) plant associations in northeastern Nevada , 1987 .

[8]  M. Sykes,et al.  The use of the Cole/Hurlbert C8 association coefficient in inverse ecological classification , 1990 .

[9]  Alan R. Johnson,et al.  PAIRWISE SPECIES ASSOCIATIONS IN THE PERENNIAL VEGETATION OF THE NORTHERN CHIHUAHUAN DESERT , 2004 .

[10]  Louis Legendre,et al.  The Importance of Being Digital , 1963 .

[11]  H. V. Groenewoud The robustness of Correspondence, Detrended Correspondence, and TWINSPAN Analysis , 1992 .

[12]  Stefano Marsili-Libelli,et al.  Fuzzy Clustering of Ecological Data , 1991 .

[13]  W. Willner,et al.  Picea abies andAbies alba forests of the austrian alps: Numerical classification and ordination , 2002, Folia Geobotanica.

[14]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[15]  Robert P. Mc Intosh,et al.  Matrix and Plexus Techniques , 1978 .

[16]  G. Swartzman,et al.  Statistical Ecology, a Primer on Methods and Computing@@@Ecology Simulation Primer , 1990 .

[17]  Rapid Initial Clustering of Large Data Sets , 1980 .

[18]  Glenn De'ath,et al.  Extended dissimilarity: a method of robust estimation of ecological distances from high beta diversity data , 1999, Plant Ecology.

[19]  Mark Hill,et al.  Indicator species analysis, a divisive polythetic method of classification, and its application to a survey of native pinewoods in Scotland , 1975 .

[20]  Enrico Feoli,et al.  Modelling Environmental Responses of Plant Associations: A Review of Some Critical Concepts in Vegetation Study , 2004 .

[21]  Z. Hubálek COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE‐ABSENCE) DATA: AN EVALUATION , 1982 .

[22]  S. Bartha Preliminary scaling for multi-species coalitions in primary succession , 1992 .

[23]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[24]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[25]  Helge Bruelheide,et al.  Towards unification of national vegetation classifications: A comparison of two methods for analysis of large data sets , 2000 .

[26]  Helge Bruelheide,et al.  A new measure of fidelity and its application to defining species groups , 2000 .

[27]  J. Ewald A probabilistic approach to estimating species pools from large compositional matrices , 2002 .

[28]  Svante Janson,et al.  Measures of ecological association , 1981, Oecologia.

[29]  Miguel Equihua,et al.  Fuzzy clustering of ecological data. , 1990 .

[30]  J. Matthews AN APPLICATION OF NON-METRIC MULTIDIMENSIONAL SCALING TO THE CONSTRUCTION OF AN IMPROVED SPECIES PLEXUS , 1978 .

[31]  Lee Belbin,et al.  Comparing three classification strategies for use in ecology , 1993 .

[32]  Hugh G. Gauch,et al.  Multivariate analysis in community ecology , 1984 .

[33]  J. Janssen A simple clustering procedure for preliminary classification of very large sets of phytosociological releves , 1975, Vegetatio.

[34]  Donald A. Jackson,et al.  Similarity Coefficients: Measures of Co-Occurrence and Association or Simply Measures of Occurrence? , 1989, The American Naturalist.

[35]  G. Clarke,et al.  Vegetation of Pande and Kiono Coastal forests, Tanzania , 1994, Vegetatio.

[36]  Kenneth H. Nicholls,et al.  Application of fuzzy cluster analysis to Lake Simcoe crustacean zooplankton community structure , 2001 .

[37]  W. T. Williams,et al.  Multivariate Methods in Plant Ecology: III. Inverse Association-Analysis , 1961 .

[38]  J. Ludwig,et al.  Statistical ecology: a primer on methods & computing , 1988 .

[39]  Jörg Ewald,et al.  A critique for phytosociology , 2003 .

[40]  Stanley L. Welsh,et al.  A Utah Flora , 1993 .