Model-based methods to identify multiple cluster structures in a data set

There is an interest in the problem of identifying different partitions of a given set of units obtained according to different subsets of the observed variables (multiple cluster structures). A model-based procedure has been previously developed for detecting multiple cluster structures from independent subsets of variables. The method relies on model-based clustering methods and on a comparison among mixture models using the Bayesian Information Criterion. A generalization of this method which allows the use of any model-selection criterion is considered. A new approach combining the generalized model-based procedure with variable-clustering methods is proposed. The usefulness of the new method is shown using simulated and real examples. Monte Carlo methods are employed to evaluate the performance of various approaches. Data matrices with two cluster structures are analyzed taking into account the separation of clusters, the heterogeneity within clusters and the dependence of cluster structures.

[1]  Helena Bacelar-Nicolau,et al.  Some Trends in the Classification of Variables , 1998 .

[2]  Giuliano Galimberti,et al.  Identifying Multiple Cluster Structures Through Latent Class Models , 2005, GfKl.

[3]  E. Vigneau,et al.  Clustering of Variables Around Latent Components , 2003 .

[4]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[5]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1974 .

[6]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[7]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis , 2002 .

[8]  Gesellschaft für Klassifikation. Jahrestagung,et al.  From Data and Information Analysis to Knowledge Engineering, Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Magdeburg, March 9-11, 2005 , 2006, GfKl.

[9]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[10]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[11]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[12]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[13]  Gabriele Soffritti,et al.  Hierarchical clustering of variables: a comparison among strategies of analysis , 1999 .

[14]  Gabriele Soffritti,et al.  Identifying Multiple Cluster Structures in a Data Matrix , 2003 .

[15]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[17]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[18]  J. Friedman Clustering objects on subsets of attributes , 2002 .

[19]  Jay Magidson,et al.  Latent Class Factor and Cluster Models, Bi-Plots, and Related Graphical Displays , 2001 .

[20]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[21]  Belitskaya‐Levy Ilana A generalized clustering problem, with application to DNA microarrays. , 2006 .

[22]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[23]  Elaine B. Martin,et al.  On principal component analysis in L 1 , 2002 .

[24]  Ivan Kojadinovic,et al.  Agglomerative hierarchical clustering of continuous variables based on mutual information , 2004, Comput. Stat. Data Anal..

[25]  I. Jolliffe Principal Component Analysis , 2002 .

[26]  P. Deb Finite Mixture Models , 2008 .

[27]  Ilana Belitskaya-Levy,et al.  A generalized clustering problem, with application to DNA microarrays. , 2006, Statistical applications in genetics and molecular biology.

[28]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[30]  Hamparsum Bozdogan,et al.  Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity , 1994 .

[31]  Hans-Hermann Bock,et al.  Data Science, Classification and Related Methods , 1998 .