Parsimonious skew mixture models for model-based clustering and classification

Robust mixture modeling approaches using skewed distributions have recently been explored to accommodate asymmetric data. Parsimonious skew- t and skew-normal analogues of the GPCM family that employ an eigenvalue decomposition of a scale matrix are introduced. The methods are compared to existing models in both unsupervised and semi-supervised classification frameworks. Parameter estimation is carried out using the expectation-maximization algorithm and models are selected using the Bayesian information criterion. The efficacy of these extensions is illustrated on simulated and real data sets.

[1]  Ryan P. Browne,et al.  Parsimonious Shifted Asymmetric Laplace Mixtures , 2013, 1311.0317.

[2]  Paul D. McNicholas,et al.  Model-based clustering of microarray expression data via latent Gaussian mixture models , 2010, Bioinform..

[3]  Adrian E. Raftery,et al.  Normal Mixture Modelling for Model-Based Clustering,Classification, and Density Estimation , 2015 .

[4]  Paul D. McNicholas,et al.  Clustering gene expression time course data using mixtures of multivariate t-distributions , 2012 .

[5]  B. Vandeginste,et al.  PARVUS: An extendable package of programs for data exploration, classification and correlation, M. Forina, R. Leardi, C. Armanino and S. Lanteri, Elsevier, Amsterdam, 1988, Price: US $645 ISBN 0‐444‐43012‐1 , 1990 .

[6]  Geoffrey J. McLachlan,et al.  Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution , 2007, Comput. Stat. Data Anal..

[7]  P. McNicholas Model-based classification using latent Gaussian mixture models , 2010 .

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[10]  Jill P. Mesirov,et al.  Automated High-Dimensional Flow Cytometric Data Analysis , 2010, RECOMB.

[11]  Geoffrey J. McLachlan,et al.  Finite mixtures of multivariate skew t-distributions: some recent and new results , 2014, Stat. Comput..

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Silvia Lanteri,et al.  Classification of olive oils from their fatty acid composition , 1983 .

[14]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[15]  Geoffrey J. McLachlan,et al.  Mixtures of common t-factor analyzers for clustering high-dimensional microarray data , 2011, Bioinform..

[16]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[17]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[18]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[19]  P. McNicholas,et al.  Model‐based clustering of longitudinal data , 2010 .

[20]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[21]  Victor H. Lachos,et al.  Multivariate mixture modeling using skew-normal independent distributions , 2012, Comput. Stat. Data Anal..

[22]  H. Riedwyl,et al.  Multivariate Statistics: A Practical Approach , 1988 .

[23]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[25]  Paul D. McNicholas,et al.  Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions , 2011, Statistics and Computing.

[26]  Paul D. McNicholas,et al.  Parsimonious Gaussian mixture models , 2008, Stat. Comput..

[27]  M. Cugmas,et al.  On comparing partitions , 2015 .

[28]  Saumyadipta Pyne,et al.  Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms , 2012, Stat. Comput..

[29]  N. Dean,et al.  Using unlabelled data to update classification rules with applications in food authenticity studies , 2006 .

[30]  Gilles Celeux,et al.  Combining Mixture Components for Clustering , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[31]  Ryan P. Browne,et al.  Model-based clustering, classification, and discriminant analysis of data with mixed type , 2012 .

[32]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[33]  Bernhard N Flury Multivariate Statistics: A Practical Approach , 1988 .

[34]  B. Lindsay,et al.  The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family , 1994 .

[35]  Dimitris Karlis,et al.  Model-based clustering with non-elliptically contoured distributions , 2009, Stat. Comput..

[36]  Ryan P. Browne,et al.  Mixtures of Shifted AsymmetricLaplace Distributions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[38]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Adrian E. Raftery,et al.  mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation , 2012 .

[40]  Geoffrey J. McLachlan,et al.  Robust Cluster Analysis via Mixtures of Multivariate t-Distributions , 1998, SSPR/SPR.

[41]  Ryan P. Browne,et al.  Estimating common principal components in high dimensions , 2013, Advances in Data Analysis and Classification.

[42]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[43]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[44]  S. Frühwirth-Schnatter,et al.  Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. , 2010, Biostatistics.

[45]  Paul D. McNicholas,et al.  Model-based classification via mixtures of multivariate t-distributions , 2011, Comput. Stat. Data Anal..

[46]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[47]  Salvatore Ingrassia,et al.  Constrained monotone EM algorithms for mixtures of multivariate t distributions , 2010, Stat. Comput..

[48]  P. McNicholas,et al.  Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant , 2011 .

[49]  S. Sahu,et al.  A new class of multivariate skew distributions with applications to bayesian regression models , 2003 .

[50]  Irene Vrbik,et al.  Analytic calculations for the EM algorithm for multivariate skew-t mixture models , 2012 .

[51]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[52]  Riccardo Leardi,et al.  PARVUS: An Extendable Package of Programs for Data Exploration , 1988 .

[53]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[54]  Tsung-I Lin,et al.  Some results on the truncated multivariate t distribution , 2012 .

[55]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[56]  Paul D. McNicholas,et al.  Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions , 2013 .

[57]  P. McNicholas,et al.  Extending mixtures of multivariate t-factor analyzers , 2011, Stat. Comput..

[58]  Kui Wang,et al.  Multivariate Skew t Mixture Models: Applications to Fluorescence-Activated Cell Sorting Data , 2009, 2009 Digital Image Computing: Techniques and Applications.

[59]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[60]  Tsung-I Lin,et al.  Finite mixture modelling using the skew normal distribution , 2007 .

[61]  Michelle A. Steane,et al.  Model-Based Classification via Mixtures of Multivariate t-Factor Analyzers , 2012, Commun. Stat. Simul. Comput..

[62]  Samuel Kotz,et al.  Multivariate T-Distributions and Their Applications , 2004 .

[63]  Paul D. McNicholas,et al.  Dimension reduction for model-based clustering via mixtures of multivariate $$t$$t-distributions , 2013, Adv. Data Anal. Classif..

[64]  A. C. Aitken XXV.—On Bernoulli's Numerical Solution of Algebraic Equations , 1927 .

[65]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers , 2000, International Conference on Machine Learning.