General location model with factor analyzer covariance matrix structure and its applications

General location model (GLOM) is a well-known model for analyzing mixed data. In GLOM one decomposes the joint distribution of variables into conditional distribution of continuous variables given categorical outcomes and marginal distribution of categorical variables. The first version of GLOM assumes that the covariance matrices of continuous multivariate distributions across cells, which are obtained by different combination of categorical variables, are equal. In this paper, the GLOMs are considered in both cases of equality and unequality of these covariance matrices. Three covariance structures are used across cells: the same factor analyzer, factor analyzer with unequal specific variances matrices (in the general and parsimonious forms) and factor analyzers with common factor loadings. These structures are used for both modeling covariance structure and for reducing the number of parameters. The maximum likelihood estimates of parameters are computed via the EM algorithm. As an application for these models, we investigate the classification of continuous variables within cells. Based on these models, the classification is done for usual as well as for high dimensional data sets. Finally, for showing the applicability of the proposed models for classification, results from analyzing three real data sets are presented.

[1]  Xiao-Li Meng,et al.  Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage , 2000 .

[2]  Paul D. McNicholas,et al.  Cluster-weighted $$t$$t-factor analyzers for robust model-based clustering and dimension reduction , 2015, Stat. Methods Appl..

[3]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Giorgio Vittadini,et al.  The Generalized Linear Mixed Cluster-Weighted Model , 2015, Journal of Classification.

[5]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[6]  G. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data , 2008 .

[7]  Paul D. McNicholas,et al.  Clustering and classification via cluster-weighted factor analyzers , 2012, Advances in Data Analysis and Classification.

[8]  Alexander R. de Leon,et al.  General mixed‐data model: Extension of general location and grouped continuous models , 2007 .

[9]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[10]  Sik-Yum Lee,et al.  Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients , 1987 .

[11]  Trivellore E Raghunathan,et al.  An Extended General Location Model for Causal Inferences from Data Subject to Noncompliance and Missing Values , 2004, Biometrics.

[12]  A. D. de Leon,et al.  Classification with discrete and continuous variables via general mixed-data models , 2011 .

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  M Y Hu,et al.  Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study. , 1999, Statistics in medicine.

[15]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[16]  Danny Coomans,et al.  Clustering noisy data in a reduced dimension space via multivariate regression trees , 2006, Pattern Recognit..

[17]  J McLachlanGeoffrey,et al.  Mixtures of Factor Analyzers with Common Factor Loadings , 2010 .

[18]  Ingram Olkin,et al.  Multivariate Correlation Models with Mixed Discrete and Continuous Variables , 1961 .

[19]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[20]  Neil Gershenfeld,et al.  Nonlinear Inference and Cluster‐Weighted Modeling , 1997 .

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  J A Anderson,et al.  The grouped continuous model for multivariate ordered categorical variables and covariate adjustment. , 1985, Biometrics.

[23]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[24]  Xin-Yuan Song,et al.  A mixture of generalized latent variable models for mixed mode and heterogeneous data , 2011, Comput. Stat. Data Anal..

[25]  Salvatore Ingrassia,et al.  Clustering bivariate mixed-type data via the cluster-weighted model , 2016, Comput. Stat..

[26]  P. McNicholas On Model-Based Clustering, Classification, and Discriminant Analysis , 2011 .

[27]  Ryan P. Browne,et al.  Model-based clustering, classification, and discriminant analysis of data with mixed type , 2012 .

[28]  Wojtek J. Krzanowski,et al.  Mixtures of Continuous and Categorical Variables in Discriminant Analysis: A Hypothesis-Testing Approach , 1982 .

[29]  Jaime R. S. Fonseca,et al.  On the Performance of Information Criteria in Latent Segment Models , 2010 .

[30]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[31]  Alexander R. De Leon,et al.  Analysis of Mixed Data : Methods & Applications , 2013 .

[32]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[33]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  David E. Booth,et al.  Multivariate statistical inference and applications , 1997 .

[36]  D. Rubin,et al.  Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data , 1998 .