Algorithms for Model-based Block Gaussian Clustering

When the data consists of a set of objects described by a set of continuous variables, the clustering can concern the sets of objects (rows), variables (columns) or the both sets simultaneously. Considering the last type of clustering, we propose a new mixture model and develop an adapted Generalized EM (GEM) algorithm as part of the maximum likelihood, and a Classification GEM (CGEM) version as part of the classification maximum likelihood approach. The different steps of these new algorithms are presented showing the interest in data mining context. Some illustrative synthetic data allow us to evaluate their performances in comparison with EM and Classification EM (CEM) applied on the sets of objects given a partition of variables, and EM and CEM applied only on the set of objects.

[1]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[2]  Gérard Govaert La classification croisée , 1989, Monde des Util. Anal. Données.

[3]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[4]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[5]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[6]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[7]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[8]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[9]  Phipps Arabie,et al.  The bond energy algorithm revisited , 1990, IEEE Trans. Syst. Man Cybern..

[10]  Michael Collins,et al.  EM Algorithm , 2010, Encyclopedia of Machine Learning.

[11]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[12]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[14]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[15]  M. Nadif,et al.  Speed-up for the expectation-maximization algorithm for clustering categorical data , 2007, J. Glob. Optim..

[16]  Michael J. Symons,et al.  Clustering criteria and multivariate normal mixtures , 1981 .

[17]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..