On model-based clustering of skewed matrix data

The existing finite mixture modeling and model-based clustering literature focuses primarily on the analysis of multivariate data observed in the form of vectors, with each element representing a specific feature. In this setting, multivariate Gaussian mixture models have been the most commonly used. Due to severe modeling issues observed when normal components cannot provide adequate fit to groups, much attention has been paid to developing models capable of accounting for skewness in data. In our work, we target the problem of mixture modeling with components that can handle skewness in matrix-valued data. The proposed developments open a wide range of possible modeling capabilities, with numerous applications, as illustrated in this paper. A novel matrix mixture model is proposed. Its skewness parameters enjoy appealing interpretability. The corresponding estimation procedure and various ways of parameterization are discussed. Comprehensive simulation studies and applications to real-life datasets illustrate the efficiency of the proposed developments, supported by good results.

[1]  Bryan F. J. Manly,et al.  Exponential Data Transformations , 1976 .

[2]  Jan R. Magnus,et al.  Maximum Likelihood Estimation of the Multivariate Normal Mixture Model , 2009 .

[3]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[4]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[5]  J. L. Warner,et al.  TRANSFORMATIONS OF MULTIVARIATE DATA , 1971 .

[6]  Paul D. McNicholas,et al.  Clustering with the multivariate normal inverse Gaussian distribution , 2016, Comput. Stat. Data Anal..

[7]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[8]  Geoffrey J. McLachlan,et al.  Model-based clustering and classification with non-normal mixture distributions , 2013, Stat. Methods Appl..

[9]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[10]  Victor H. Lachos,et al.  Multivariate mixture modeling using skew-normal independent distributions , 2012, Comput. Stat. Data Anal..

[11]  Volodymyr Melnykov,et al.  Model-based biclustering of clickstream data , 2016, Comput. Stat. Data Anal..

[12]  Paul D. McNicholas,et al.  Parsimonious Gaussian mixture models , 2008, Stat. Comput..

[13]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[14]  Volodymyr Melnykov,et al.  Manly transformation in finite mixture modeling , 2016, Comput. Stat. Data Anal..

[15]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[16]  S. Sheather,et al.  Power Transformation via Multivariate Box–Cox , 2010 .

[17]  Ryan P. Browne,et al.  A mixture of generalized hyperbolic distributions , 2013, 1305.1036.

[18]  Ryan P. Browne,et al.  Mixtures of Shifted AsymmetricLaplace Distributions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Geoffrey J. McLachlan,et al.  On mixtures of skew normal and skew $$t$$-distributions , 2012, Adv. Data Anal. Classif..

[20]  N. Kiefer Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model , 1978 .

[21]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[22]  Cinzia Viroli,et al.  Finite mixtures of matrix normal distributions for classifying three-way data , 2011, Stat. Comput..

[23]  Olcay Arslan,et al.  FINITE MIXTURES OF MATRIX VARIATE T DISTRIBUTIONS , 2016 .

[24]  Cinzia Viroli,et al.  Model based clustering for three-way data structures , 2011 .

[25]  P. McNicholas,et al.  A matrix variate skew‐t distribution , 2017, Pattern Recognit..

[26]  Deniz Akdemir,et al.  A Matrix Variate Skew Distribution , 2010 .

[27]  Tsung-I Lin,et al.  Flexible mixture modelling using the multivariate skew-t-normal distribution , 2014, Stat. Comput..

[28]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[29]  Tsung I. Lin,et al.  Maximum likelihood estimation for multivariate skew normal mixture models , 2009, J. Multivar. Anal..

[30]  Solomon W. Harrar,et al.  On matrix variate skew-normal distributions , 2005 .

[31]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[32]  Ranjan Maitra,et al.  Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms , 2010 .