Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection

Matrices of binary or count data are modelled under a unified statistical framework using finite mixtures to group the rows and/or columns. These likelihood-based one-mode and two-mode fuzzy clusterings provide maximum likelihood estimation of parameters and the options of using likelihood ratio tests or information criteria for model comparison. Geometric developments focused on pattern detection give likelihood-based analogues of various techniques in multivariate analysis, including multidimensional scaling, association analysis, ordination, correspondence analysis, and the construction of biplots. Illustrative examples demonstrate the effectiveness of these visualisations for identifying patterns of ecological significance (e.g. abrupt versus slow species turnover).

[1]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[2]  Scott D. Foster,et al.  Model based grouping of species across environmental gradients , 2011 .

[3]  M. O. Hill,et al.  TWINSPAN: a FORTRAN program of arranging multivariate data in an ordered two way table by classification of individual and attributes , 1979 .

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[6]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[7]  Robert H. Whittaker,et al.  Vegetation of the Great Smoky Mountains , 1956 .

[8]  Peter Schlattmann,et al.  Estimating the number of components in a finite mixture model: the special case of homogeneity , 2003, Comput. Stat. Data Anal..

[9]  Marco Alfò,et al.  Advances in Mixture Models , 2007, Comput. Stat. Data Anal..

[10]  Yu Hayakawa,et al.  Capture–Recapture Estimation Using Finite Mixtures of Arbitrary Dimension , 2010, Biometrics.

[11]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[12]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[13]  P. Shaw,et al.  Effects of thinning treatment on an ectomycorrhizal succession under Scots pine. , 2003, Mycological research.

[14]  N. Gotelli,et al.  NULL MODELS IN ECOLOGY , 1996 .

[15]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[16]  S. Pledger Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures , 2000, Biometrics.

[17]  G. Govaert,et al.  Latent Block Model for Contingency Table , 2010 .

[18]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[19]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  W. S. Robinson A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[21]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[22]  Trevor Hastie,et al.  The Geometric Interpretation of Correspondence Analysis , 1987 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[26]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[27]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[28]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[29]  Gérard Govaert,et al.  Block Clustering of Contingency Table and Mixture Model , 2005, IDA.

[30]  B. Manly Randomization, Bootstrap and Monte Carlo Methods in Biology , 2018 .

[31]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[32]  Brian Everitt,et al.  Cluster analysis , 1974 .

[33]  K. Lange,et al.  On the Bumpy Road to the Dominant Mode , 2010, Scandinavian journal of statistics, theory and applications.

[34]  Sara A. van de Geer,et al.  Asymptotic theory for maximum likelihood in nonparametric mixture models , 2003, Comput. Stat. Data Anal..

[35]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[36]  Thomas Brendan Murphy,et al.  Computational aspects of fitting mixture models via the expectation-maximization algorithm , 2012, Comput. Stat. Data Anal..

[37]  G. J. McLachlan,et al.  9 The classification and mixture maximum likelihood approaches to cluster analysis , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[38]  Peter J. A. Shaw,et al.  Multivariate Statistics for the Environmental Sciences , 2003 .