Two new matrix-variate distributions with application in model-based clustering

Abstract Two matrix-variate distributions, both elliptical heavy-tailed generalization of the matrix-variate normal distribution, are introduced. They belong to the normal scale mixture family, and are respectively obtained by choosing a convenient shifted exponential or uniform as mixing distribution. Moreover, they have a closed-form for the probability density function that is characterized by only one additional parameter, with respect to the nested matrix-variate normal, governing the tail-weight. Both distributions are then used for model-based clustering via finite mixture models. The resulting mixtures, being able to handle data with atypical observations in a better way than the matrix-variate normal mixture, can avoid the disruption of the true underlying group structure. Different EM-based algorithms are implemented for parameter estimation and tested in terms of computational times and parameter recovery. Furthermore, these mixture models are fitted to simulated and real datasets, and their fitting and clustering performances are analyzed and compared to those obtained by other well-established competitors.

[1]  On the stability of crystal lattices. II , 1940 .

[2]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[5]  Cinzia Viroli,et al.  Model based clustering for three-way data structures , 2011 .

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  L. Bagnato,et al.  The multivariate tail-inflated normal distribution and its application in finance , 2020, 2006.12180.

[9]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[10]  Olcay Arslan,et al.  FINITE MIXTURES OF MATRIX VARIATE T DISTRIBUTIONS , 2016 .

[11]  Ryan P. Browne,et al.  Mixtures of multivariate power exponential distributions , 2015, Biometrics.

[12]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[13]  Volodymyr Melnykov,et al.  Studying crime trends in the USA over the years 2000–2012 , 2018, Adv. Data Anal. Classif..

[14]  M. A. Gómez–Villegas,et al.  A MATRIX VARIATE GENERALIZATION OF THE POWER EXPONENTIAL FAMILY OF DISTRIBUTIONS , 2002 .

[15]  Cinzia Viroli,et al.  Finite mixtures of matrix normal distributions for classifying three-way data , 2011, Stat. Comput..

[16]  Taras Bodnar,et al.  Elliptically Contoured Models in Statistics and Portfolio Theory , 2013 .

[17]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[18]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[19]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[20]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[21]  P. Dutilleul The mle algorithm for the matrix normal distribution , 1999 .

[22]  Paul D. McNicholas,et al.  Finite mixtures of skewed matrix variate distributions , 2018, Pattern Recognit..

[23]  L. Bagnato,et al.  Allometric analysis using the multivariate shifted exponential normal distribution , 2020, Biometrical journal. Biometrische Zeitschrift.

[24]  Salvatore Ingrassia,et al.  On parsimonious models for modeling matrix data , 2020, Comput. Stat. Data Anal..

[25]  P. McNicholas,et al.  A Mixture of Matrix Variate Bilinear Factor Analyzers , 2017, 1712.08664.

[26]  Paul D. McNicholas,et al.  Mixtures of skewed matrix variate bilinear factor analyzers , 2018, Advances in Data Analysis and Classification.

[27]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[28]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..