A mixture of SDB skew-t factor analyzers

Mixtures of skew-t distributions offer a flexible choice for model-based clustering. A mixture model of this sort can be implemented using a variety of formulations of the skew-t distribution. A mixture of skew-t factor analyzers model for clustering of high-dimensional data using a flexible formulation of the skew-t distribution is developed. Methodological details of the proposed approach, which represents an extension of the mixture of factor analyzers model to a flexible skew-t distribution, are outlined and details of parameter estimation are provided. Clustering results are illustrated and compared to an alternative formulation of the mixture of skew-t factor analyzers model as well as the mixture of factor analyzers model.

[1]  Florence Forbes,et al.  Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering , 2015, Comput. Stat. Data Anal..

[2]  Paul D. McNicholas,et al.  Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models , 2010, Comput. Stat. Data Anal..

[3]  Ryan P. Browne,et al.  A mixture of common skew‐t factor analysers , 2013, 1307.5558.

[4]  B. Jørgensen Statistical Properties of the Generalized Inverse Gaussian Distribution , 1981 .

[5]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[6]  Geoffrey J. McLachlan,et al.  Comment on "On Nomenclature, and the Relative Merits of Two Formulations of Skew Distributions" by A. Azzalini, R. Browne, M. Genton, and P. McNicholas , 2016, 1601.00773.

[7]  Paul D. McNicholas,et al.  Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions , 2013, Advances in Data Analysis and Classification.

[8]  G. McLachlan,et al.  On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm , 2011, 1109.4706.

[9]  Sharon X. Lee,et al.  Finite mixtures of canonical fundamental skew $$t$$t-distributions , 2014 .

[10]  Zhou Xing-cai,et al.  The EM Algorithm for Factor Analyzers:An Extension with Latent Variable , 2006 .

[11]  Sharon X. Lee,et al.  A robust factor analysis model using the restricted skew-$$t$$t distribution , 2015 .

[12]  Paul D. McNicholas,et al.  Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions , 2013 .

[13]  P. McNicholas,et al.  A Mixture of Variance-Gamma Factor Analyzers , 2017 .

[14]  Piotr A. Kowalski,et al.  Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images , 2010 .

[15]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[16]  Michelle A. Steane,et al.  Model-Based Classification via Mixtures of Multivariate t-Factor Analyzers , 2012, Commun. Stat. Simul. Comput..

[17]  Ryan P. Browne,et al.  Mixtures of Shifted AsymmetricLaplace Distributions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Geoffrey J. McLachlan,et al.  On mixtures of skew normal and skew $$t$$-distributions , 2012, Adv. Data Anal. Classif..

[19]  Catherine B. Hurley,et al.  Clustering Visualizations of Multidimensional Data , 2004 .

[20]  M. Genton,et al.  On fundamental skew distributions , 2005 .

[21]  Ryan P. Browne,et al.  A mixture of generalized hyperbolic factor analyzers , 2013, Advances in Data Analysis and Classification.

[22]  N. Shephard,et al.  Non‐Gaussian Ornstein–Uhlenbeck‐based models and some of their uses in financial economics , 2001 .

[23]  Paul D. McNicholas,et al.  Parsimonious Gaussian mixture models , 2008, Stat. Comput..

[24]  Tsung I. Lin,et al.  Robust mixture modeling using multivariate skew t distributions , 2010, Stat. Comput..

[25]  S. Frühwirth-Schnatter,et al.  Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. , 2010, Biostatistics.

[26]  Paul D. McNicholas,et al.  Model-based classification via mixtures of multivariate t-distributions , 2011, Comput. Stat. Data Anal..

[27]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..

[28]  Geoffrey J. McLachlan,et al.  Extending mixtures of factor models using the restricted multivariate skew-normal distribution , 2013, J. Multivar. Anal..

[29]  P. McNicholas,et al.  Non-Gaussian Mixtures for Dimension Reduction, Clustering, Classification, and Discriminant Analysis , 2013 .

[30]  M. Forina,et al.  Multivariate data analysis as a discriminating method of the origin of wines , 2015 .

[31]  P. McNicholas,et al.  A matrix variate skew‐t distribution , 2017, Pattern Recognit..

[32]  Paul D. McNicholas,et al.  Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures , 2013, Comput. Stat. Data Anal..

[33]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[34]  Geoffrey J. McLachlan,et al.  Mixtures of common t-factor analyzers for clustering high-dimensional microarray data , 2011, Bioinform..

[35]  A. C. Aitken III.—A Series Formula for the Roots of Algebraic and Transcendental Equations , 1926 .

[36]  Paul D. McNicholas,et al.  Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions , 2011, Statistics and Computing.

[37]  Ryan P. Browne,et al.  Mixtures of Shifted Asymmetric Laplace Distributions , 2012 .

[38]  O. Barndorff-Nielsen,et al.  Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions , 1977 .

[39]  Paul D. McNicholas,et al.  Clustering with the multivariate normal inverse Gaussian distribution , 2016, Comput. Stat. Data Anal..

[40]  Ryan P. Browne,et al.  Mixtures of skew-t factor analyzers , 2013, Comput. Stat. Data Anal..

[41]  Jack C. Lee,et al.  Robust mixture modeling using the skew t distribution , 2007, Stat. Comput..

[42]  Paul D. McNicholas,et al.  Parsimonious skew mixture models for model-based clustering and classification , 2013, Comput. Stat. Data Anal..

[43]  B. Lindsay,et al.  The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family , 1994 .

[44]  P. McNicholas,et al.  Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant , 2011 .

[45]  S. Sahu,et al.  A new class of multivariate skew distributions with applications to bayesian regression models , 2003 .

[46]  Irene Vrbik,et al.  Analytic calculations for the EM algorithm for multivariate skew-t mixture models , 2012 .

[47]  A. Montanari,et al.  Heteroscedastic factor mixture analysis , 2010 .

[48]  John H. Wolfe,et al.  A COMPUTER PROGRAM FOR THE MAXIMUM LIKELIHOOD ANALYSIS OF TYPES , 1965 .

[49]  Geoffrey J. McLachlan,et al.  Finite mixtures of canonical fundamental skew $$t$$t-distributions , 2014, Stat. Comput..

[50]  P. McNicholas Model-based classification using latent Gaussian mixture models , 2010 .

[51]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[52]  Paul D. McNicholas,et al.  Model-based clustering of microarray expression data via latent Gaussian mixture models , 2010, Bioinform..

[53]  Angela Montanari,et al.  A skew-normal factor model for the analysis of student satisfaction towards university courses , 2010 .

[54]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[55]  Geoffrey J. McLachlan,et al.  Finite mixtures of multivariate skew t-distributions: some recent and new results , 2014, Stat. Comput..

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  P. McNicholas,et al.  Extending mixtures of multivariate t-factor analyzers , 2011, Stat. Comput..

[58]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[59]  Florence Forbes,et al.  A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering , 2013, Statistics and Computing.

[60]  Arjun K. Gupta,et al.  A multivariate skew normal distribution , 2004 .

[61]  Geoffrey J. McLachlan,et al.  Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution , 2007, Comput. Stat. Data Anal..

[62]  Dimitris Karlis,et al.  Model-based clustering with non-elliptically contoured distributions , 2009, Stat. Comput..

[63]  C. Halgreen Self-decomposability of the generalized inverse Gaussian and hyperbolic distributions , 1979 .

[64]  Paul D. McNicholas,et al.  Dimension reduction for model-based clustering via mixtures of multivariate $$t$$t-distributions , 2013, Adv. Data Anal. Classif..

[65]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[66]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers , 2000, International Conference on Machine Learning.

[67]  Ryan P. Browne,et al.  Mixtures of Multiple Scaled Generalized Hyperbolic Distributions , 2014 .

[68]  Ryan P. Browne,et al.  Comparing two formulations of skew distributions with special reference to model-based clustering , 2014 .

[69]  Ryan P. Browne,et al.  Mixtures of Variance-Gamma Distributions , 2013, 1309.2695.

[70]  Tsung-I Lin,et al.  Flexible mixture modelling using the multivariate skew-t-normal distribution , 2014, Stat. Comput..

[71]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Ryan P. Browne,et al.  A mixture of generalized hyperbolic distributions , 2013, 1305.1036.