Matrix Normal Cluster-Weighted Models

Finite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e., the allocation of data points to the clusters is made independently of the distribution of the covariates. To take into account the latter aspect, finite mixtures of regressions with random covariates, also known as cluster-weighted models (CWMs), have been proposed in the univariate and multivariate literature. In this paper, the CWM is extended to matrix data, e.g., those data where a set of variables are simultaneously observed at different time points or locations. Specifically, the cluster-specific marginal distribution of the covariates and the cluster-specific conditional distribution of the responses given the covariates are assumed to be matrix normal. Maximum likelihood parameter estimates are derived using an expectation-conditional maximization algorithm. Parameter recovery, classification assessment, and the capability of the Bayesian information criterion to detect the underlying groups are investigated using simulated data. Finally, two real data applications concerning educational indicators and the Italian non-life insurance market are presented.

[1]  Gianfranco Piras,et al.  splm: Spatial Panel Data Models in R , 2012 .

[2]  Paul D. McNicholas,et al.  Clustering and classification via cluster-weighted factor analyzers , 2012, Advances in Data Analysis and Classification.

[3]  Salvatore Ingrassia,et al.  Modeling Return to Education in Heterogeneous Populations: An Application to Italy , 2017, Statistical Learning of Complex Data.

[4]  P. McNicholas Mixture Model-Based Classification , 2016 .

[5]  Volodymyr Melnykov,et al.  Studying crime trends in the USA over the years 2000–2012 , 2018, Adv. Data Anal. Classif..

[6]  P. McNicholas,et al.  A matrix variate skew‐t distribution , 2017, Pattern Recognit..

[7]  Giovanni Millo,et al.  Non-life insurance consumption in Italy: a sub-regional panel data analysis , 2011, J. Geogr. Syst..

[8]  Cinzia Viroli,et al.  Finite mixtures of matrix normal distributions for classifying three-way data , 2011, Stat. Comput..

[9]  A. Punzo Flexible mixture modelling with the polynomial Gaussian cluster-weighted model , 2012, 1207.0939.

[10]  Salvatore Ingrassia,et al.  Parsimonious Generalized Linear Gaussian Cluster-Weighted Models , 2015 .

[11]  Salvatore Ingrassia,et al.  Model-based clustering via linear cluster-weighted models , 2012, Comput. Stat. Data Anal..

[12]  M. Cugmas,et al.  On comparing partitions , 2015 .

[13]  A. Montanari,et al.  A Matrix-Variate Regression Model with Canonical States: An Application to Elderly Danish Twins , 2014 .

[14]  P. Dutilleul The mle algorithm for the matrix normal distribution , 1999 .

[15]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[16]  Ranjan Maitra,et al.  Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms , 2010 .

[17]  Giorgio Vittadini,et al.  The Generalized Linear Mixed Cluster-Weighted Model , 2015, Journal of Classification.

[18]  Luca Bagnato,et al.  Two new matrix-variate distributions with application in model-based clustering , 2020, Comput. Stat. Data Anal..

[19]  Salvatore Ingrassia,et al.  Clustering bivariate mixed-type data via the cluster-weighted model , 2016, Comput. Stat..

[20]  Ryan P. Browne,et al.  Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models , 2014, Journal of Classification.

[21]  Salvatore Ingrassia,et al.  On parsimonious models for modeling matrix data , 2020, Comput. Stat. Data Anal..

[22]  N. Gershenfeld,et al.  Cluster-weighted modelling for time-series analysis , 1999, Nature.

[23]  Salvatore Ingrassia,et al.  Decision boundaries for mixtures of regressions , 2016 .

[24]  Neil Gershenfeld,et al.  Nonlinear Inference and Cluster‐Weighted Modeling , 1997 .

[25]  Volodymyr Melnykov,et al.  On model-based clustering of skewed matrix data , 2018, J. Multivar. Anal..

[26]  P. McNicholas,et al.  Families of Parsimonious Finite Mixtures of Regression Models , 2013, 1312.0518.

[27]  L. Hubert,et al.  Comparing partitions , 1985 .

[28]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[29]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[30]  Salvatore Ingrassia,et al.  Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition , 2020, J. Classif..

[31]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[32]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[33]  Paul D. McNicholas,et al.  Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model , 2014, J. Classif..

[34]  Cinzia Viroli,et al.  On matrix-variate regression analysis , 2012, J. Multivar. Anal..

[35]  George B. Macready,et al.  Concomitant-Variable Latent-Class Models , 1988 .

[36]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[37]  C. Viroli,et al.  Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data , 2014, 1401.1301.

[38]  Paul D. McNicholas,et al.  Finite mixtures of skewed matrix variate distributions , 2018, Pattern Recognit..

[39]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[40]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[41]  Salvatore Ingrassia,et al.  flexCWM: A Flexible Framework for Cluster-Weighted Models , 2018 .

[42]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[43]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[44]  Volodymyr Melnykov,et al.  An effective strategy for initializing the EM algorithm in finite mixture models , 2016, Advances in Data Analysis and Classification.

[45]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[46]  Giorgio Vittadini,et al.  Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions , 2012, J. Classif..