Clustering and Partitioning

Hierarchical clustering methods and partitioning techniques such as K-means partitioning and two-way indicator species analysis are useful tools for summarising group structure within large, complex, multivariate data-sets that are increasingly common in palaeolimnology. The incorporation of one- or two-dimensional constraints in the clustering algorithms provides means of exploring group structure in temporal, stratigraphical data and in geographical modern data, respectively. Indicator species analysis with its associated permutation tests is a simple and effective means of detecting statistically significant indicator species for any grouping of a set of objects. The newly developed approach of multivariate regression trees combines partitioning and data exploration with regression and data interpretation and modelling.

[1]  A. D. Gordon 359. Note: Classification in the Presence of Constraints , 1973 .

[2]  J. Finn A General Model for Multivariate Analysis , 1978 .

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  A. D. Gordon NUMERICAL METHODS IN QUATERNARY PALAEOECOLOGY , 1974 .

[6]  John P. Smol,et al.  Pollution of Lakes and Rivers: A Paleoenvironmental Perspective , 2002 .

[7]  H. J. B. Birks,et al.  Numerical Methods in Quaternary Pollen Analysis. , 1989 .

[8]  Jan Lepš,et al.  Multivariate Analysis of Ecological Data using CANOCO , 2003 .

[9]  Neil L. Rose,et al.  Recent Environmental Change and Atmospheric Contamination on Svalbard as Recorded in Lake Sediments – Modern Limnology, Vegetation, and Pollen Deposition , 2004 .

[10]  J. Gower A comparison of some methods of cluster analysis. , 1967, Biometrics.

[11]  L. Lefkovitch,et al.  Cluster generation and grouping using mathematical programming , 1978 .

[12]  Harry John Betteley Birks,et al.  Quaternary palaeoecology and vegetation science— current contributions and possible future developments , 1993 .

[13]  François-Joseph Lapointe,et al.  THE AVERAGE CONSENSUS PROCEDURE: COMBINATION OF WEIGHTED TREES CONTAINING IDENTICAL OR OVERLAPPING SETS OF TAXA , 1997 .

[14]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[15]  Miquel De Cáceres,et al.  Improving indicator species analysis by combining groups of sites , 2010 .

[16]  P. Legendre,et al.  Associations between species and groups of sites: indices and statistical inference. , 2009, Ecology.

[17]  Mark Hill,et al.  Indicator species analysis, a divisive polythetic method of classification, and its application to a survey of native pinewoods in Scotland , 1975 .

[18]  C. Lindegaard,et al.  Significance of subfossile chironomid remains in classification of shallow lakes , 2004, Hydrobiologia.

[19]  H. J. B. Birks,et al.  NUMERICAL METHODS IN QUATERNARY PALAEOECOLOGY I. ZONATION OF POLLEN DIAGRAMS , 1972 .

[20]  David J. Hand,et al.  Short communication: Optimising k-means clustering results with standard software packages , 2005 .

[21]  Thomas A. Davidson,et al.  The simultaneous inference of zooplanktivorous fish and macrophyte density from sub-fossil cladoceran assemblages: a multivariate regression tree approach , 2010 .

[22]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[23]  P. Speckman,et al.  Multivariate Regression Trees for Analysis of Abundance Data , 2004, Biometrics.

[24]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[25]  T. Penczak,et al.  Fish assemblage compositions after implementation of the IndVal method on the Narew River system , 2009 .

[26]  G. De’ath MULTIVARIATE REGRESSION TREES: A NEW TECHNIQUE FOR MODELING SPECIES–ENVIRONMENT RELATIONSHIPS , 2002 .

[27]  Thomas A. Davidson,et al.  Inferring past zooplanktivorous fish and macrophyte density in a shallow lake: application of a new regression tree model , 2010 .

[28]  M. O. Hill,et al.  TWINSPAN: a FORTRAN program of arranging multivariate data in an ordered two way table by classification of individual and attributes , 1979 .

[29]  Richard W. Battarbee,et al.  Acidification of lakes in Galloway, south west Scotland: a diatom and pollen study of the post-glacial history of the Round Loch of Glenhead , 1989 .

[30]  Pierre Legendre,et al.  Numerical Ecology with R , 2011 .

[31]  Louis Legendre,et al.  Succession of Species within a Community: Chronological Clustering, with Applications to Marine and Freshwater Zooplankton , 1985, The American Naturalist.

[32]  Anson W. Mackay,et al.  Diatom sensitivity to hydrological and nutrient variability in a subtropical, flood‐pulse wetland , 2012 .

[33]  Lee Belbin,et al.  Comparing three classification strategies for use in ecology , 1993 .

[34]  P. Legendre,et al.  SPECIES ASSEMBLAGES AND INDICATOR SPECIES:THE NEED FOR A FLEXIBLE ASYMMETRICAL APPROACH , 1997 .

[35]  W. T. Williams,et al.  A Generalized Sorting Strategy for Computer Classifications , 1966, Nature.

[36]  Helen Bennion,et al.  Assessing eutrophication and reference conditions for Scottish freshwater lochs using subfossil diatoms , 2004 .

[37]  N. Anderson,et al.  Distribution of chironomids (Diptera) in low arctic West Greenland lakes: trophic conditions, temperature and environmental reconstruction , 2002 .

[38]  Colin R. Janssen,et al.  Recurrent groups of pollen types in time , 1994 .

[39]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[40]  S. Dolnicar,et al.  A Tale of Three Cities: Perceptual Charting for Analyzing Destination Imagess , 1998 .

[41]  Helen Bennion,et al.  The use of diatom records to establish reference conditions for UK lakes subject to eutrophication , 2011 .

[42]  Ter Braak,et al.  Canoco reference manual and CanoDraw for Windows user''s guide: software for canonical community ord , 2002 .

[43]  Eric C. Grimm,et al.  Data analysis and display , 1988 .

[44]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[45]  Nicholas John Anderson,et al.  The Surface Waters Acidification Project Palaeolimnology Programme: Modern Diatom / Lake-Water Chemistry Data-Set , 1991 .

[46]  D. Moss,et al.  An initial classification of 10-km squares in Great Britain from a land characteristic data bank , 1985 .

[47]  Steve Juggins,et al.  The relationship between water chemistry and surface sediment diatom assemblages in maritime Antarctic lakes , 1993, Antarctic Science.

[48]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[49]  Milan Chytrý,et al.  The relationships of modern pollen spectra to vegetation and climate along a steppe–forest–tundra transition in southern Siberia, explored by decision trees , 2008 .

[50]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[51]  H. John B. Birks,et al.  From Classical to Canonical Ordination , 2012 .

[52]  Petr Šmilauer,et al.  CANOCO 4.5 Reference Manual and CanoDraw for Windows User's Guide: Software for Canonical Community Ordination , 2002 .

[53]  John P. Smol,et al.  The diatoms: applications for the environmental and earth sciences , 2012 .

[54]  Ulrike Herzschuh,et al.  Evaluating the indicator value of Tibetan pollen taxa for modern vegetation and climate , 2010 .

[55]  C.J.F. ter Braak Interpreting a hierarchical classification with simple discriminant functions: an ecological example , 1986 .

[56]  Ingemar Renberg,et al.  Diatoms as indicators of surface-water acidity. , 1999 .

[57]  Bent Vad Odgaard,et al.  Subfossil Cladocera in relation to contemporary environmental variables in 54 Pan‐European lakes , 2009 .

[58]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[59]  F. Rohlf Consensus indices for comparing classifications , 1982 .

[60]  P. Legendre,et al.  A Classification of Pure Malt Scotch Whiskies , 1994 .

[61]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[62]  Harry Timmermans,et al.  Consumer Psychology of Tourism, Hospitality and Leisure , 2001 .

[63]  N. Platnick,et al.  On the Information Content of Classifications , 1989, Cladistics : the international journal of the Willi Hennig Society.

[64]  J. Gower Maximal predictive classification , 1974 .

[65]  Alan H. Fielding,et al.  Cluster and Classification Techniques for the Biosciences , 2006 .

[66]  F. James Rohlf,et al.  Classification of Aedes by Numerical Taxonomic Methods (Diptera: Culicidae) , 1963 .

[67]  A. D. Gordon,et al.  NUMERICAL-METHODS IN QUATERNARY PALEOECOLOGY .2. COMPARISON OF POLLEN DIAGRAMS , 1972 .

[68]  Fionn Murtagh,et al.  Multivariate methods for data analysis , 1993 .

[69]  Martin Kent,et al.  Vegetation Description and Analysis: A Practical Approach , 1992 .

[70]  H. John B. Birks,et al.  Statistical Learning in Palaeolimnology , 2012 .

[71]  R. Wehrens Chemometrics with R , 2020, Use R!.

[72]  François-Joseph Lapointe,et al.  ASSESSING CONGRUENCEAMONG DISTANCE MATRICES: SINGLE‐MALT SCOTCH WHISKIES REVISITED , 2004 .

[73]  B. Berglund,et al.  Handbook of Holocene Palaeoecology and Palaeohydrology , 2003 .

[74]  Gavin Simpson,et al.  Human Impacts: Applications of Numerical Methods to Evaluate Surface-Water Acidification and Eutrophication , 2012 .

[75]  János Podani,et al.  Detecting indicator species: Some extensions of the IndVal measure , 2010 .

[76]  Mark Hill,et al.  An Environmentally Defined Biogeographical Zonation of Scotland Designed to Reflect Species Distributions , 1995 .

[77]  Helen Bennion,et al.  A reference typology of low alkalinity lakes in the UK based on pre-acidification diatom assemblages from lake sediment cores , 2011 .

[78]  Stefan Engels,et al.  Changes in fossil chironomid remains along a depth gradient: evidence for common faunal thresholds within lakes , 2011, Hydrobiologia.

[79]  Martin Kernan,et al.  Regionalisation of chemical variability in European mountain lakes , 2009 .

[80]  Martin Kernan,et al.  Remote European mountain lake ecosystems: regionalisation and ecological status , 2009 .

[81]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[82]  Martin Kernan,et al.  Regionalisation of remote European mountain lake ecosystems according to their biota: environmental versus geographical patterns. , 2009 .

[83]  H. John B. Birks,et al.  Analysis of Stratigraphical Data , 2012 .

[84]  Chris Caseldine,et al.  NUMERICAL ANALYSIS OF SURFACE POLLEN SPECTRA FROM BANKHEAD MOSS, FIFE , 1978 .

[85]  Erik Jeppesen,et al.  Lake depth rather than fish planktivory determines cladoceran community structure in Faroese lakes – evidence from contemporary data and sediments , 2006 .

[86]  Pierre Legendre,et al.  Postglacial dispersal of freshwater fishes in the Québec peninsula , 1984 .

[87]  Frederic Bartumeus,et al.  Ecological thresholds in European alpine lakes , 2009 .

[88]  M. Miyamoto,et al.  Phylogenetic Analysis of DNA Sequences , 1991 .

[89]  H. J. B. Birks,et al.  Numerical analysis of pollen samples from central Canada: A comparison of methods , 1975 .

[90]  E. Grimm CONISS: a FORTRAN 77 program for stratigraphically constrained cluster analysis by the method of incremental sum of squares , 1987 .

[91]  H. John B. Birks,et al.  Overview of Numerical Methods in Palaeolimnology , 2012 .

[92]  Atte Korhola,et al.  Patterns in the distribution, composition and diversity of diatom assemblages in relation to ecoclimatic factors in Arctic Lapland , 2001 .

[93]  C. Roche,et al.  Exemple de classification hiérarchique avec contraintes de contiguïté : le partage d'Aix-en-Provence en quartiers homogènes , 1978 .

[94]  Brian Everitt,et al.  Cluster analysis , 1974 .