Dimensionally Reduced Model-Based Clustering Through Mixtures of Factor Mixture Analyzers

Dimensionally reduced model-based clustering methods are recently receiving a wide interest in statistics as a tool for performing simultaneously clustering and dimension reduction through one or more latent variables. Among these, Mixtures of Factor Analyzers assume that, within each component, the data are generated according to a factor model, thus reducing the number of parameters on which the covariance matrices depend. In Factor Mixture Analysis clustering is performed through the factors of an ordinary factor analysis which are jointly modelled by a Gaussian mixture. The two approaches differ in genesis, parameterization and consequently clustering performance. In this work we propose a model which extends and combines them. The proposed Mixtures of Factor Mixture Analyzers provide a unified class of dimensionally reduced mixture models which includes the previous ones as special cases and could offer a powerful tool for modelling non-Gaussian latent variables.

[1]  Geoffrey J. McLachlan,et al.  A case study of two clustering methods based on maximum likelihood , 1979 .

[2]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[3]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[4]  S. Weisberg,et al.  An Introduction to Regression Graphics , 1994 .

[5]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[6]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[8]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[9]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[10]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[11]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[14]  Seiya Imoto,et al.  A mixed factors model for dimension reduction and extraction of a group structure in gene expression data , 2004 .

[15]  H. Akaike A new look at the statistical model identification , 1974 .

[16]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[17]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[20]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[21]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[22]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[23]  A. Montanari,et al.  Heteroscedastic factor mixture analysis , 2010 .

[24]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[25]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[26]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[27]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[28]  Soheil Shams,et al.  Noise Sampling Method: An ANOVA Approach Allowing Robust Selection of Differentially Regulated Genes Measured by DNA Microarrays , 2003, Bioinform..