Mixture-based clustering for the ordered stereotype model

Many of the methods which deal with the reduction of dimensionality in matrices of data are based on mathematical techniques such as distance-based algorithms or matrix decomposition and eigenvalues. Recently a group of likelihood-based finite mixture models for a data matrix with binary or count data, using basic Bernoulli or Poisson building blocks has been developed. This is extended and establishes likelihood-based multivariate methods for a data matrix with ordinal data which applies fuzzy clustering via finite mixtures to the ordered stereotype model. Model-fitting is performed using the expectation-maximization (EM) algorithm, and a fuzzy allocation of rows, columns, and rows and columns simultaneously to corresponding clusters is obtained. A simulation study is presented which includes a variety of scenarios in order to test the reliability of the proposed model. Finally, the results of the application of the model in two real data sets are shown. New methodology for clustering rows and columns from a matrix of ordinal data.Establishes likelihood-based methods via finite mixtures with the stereotype model.Tests the reliability of this methodology through a simulation study.Illustrates this new approach with two examples.Reviews and compares the performance several model choice measures.

[1]  Gérard Govaert,et al.  An improvement of the NEC criterion for assessing the number of clusters in a mixture model , 1999, Pattern Recognit. Lett..

[2]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Sang Uk Lee,et al.  Integrated Position Estimation Using Aerial Image Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  George R. Franke,et al.  Correspondence Analysis: Graphical Representation of Categorical Data in Marketing Research , 1986 .

[5]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[6]  Ivy Liu,et al.  Biclustering Models for Two-Mode Ordinal Data , 2016, Psychometrika.

[7]  A. Agresti Analysis of Ordinal Categorical Data , 1985 .

[8]  Mohamed Nadif,et al.  Co-clustering for Binary and Categorical Data with Maximum Modularity , 2011, 2011 IEEE 11th International Conference on Data Mining.

[9]  Oliver Kuss On the estimation of the stereotype regression model , 2006, Comput. Stat. Data Anal..

[10]  N. Gotelli,et al.  NULL MODELS IN ECOLOGY , 1996 .

[11]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[12]  L. A. Goodman Simple Models for the Analysis of Association in Cross-Classifications Having Ordered Categories , 1979 .

[13]  Ivy Liu The Analysis of Ordered Categorical Data : An Overview and a Survey of Recent Developments , 2005 .

[14]  R. Brant Assessing proportionality in the proportional odds model for ordinal logistic regression. , 1990, Biometrics.

[15]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Wayne S. DeSarbo,et al.  A hierarchical bayesian procedure for two-mode cluster analysis , 2004 .

[17]  Stanley P. Azen,et al.  Computational Statistics and Data Analysis (CSDA) , 2006 .

[18]  C. Hennig,et al.  How to find an appropriate clustering for mixed‐type variables with application to socio‐economic stratification , 2013 .

[19]  G. Govaert,et al.  Latent Block Model for Contingency Table , 2010 .

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[22]  Daniel Ståhl,et al.  Model‐based cluster analysis , 2012 .

[23]  Richard Arnold,et al.  Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection , 2014, Comput. Stat. Data Anal..

[24]  Robert H. Shumway,et al.  The model selection criterion AICu , 1997 .

[25]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[26]  Irini Moustaki,et al.  A Latent Variable Model for Ordinal Variables , 2000 .

[27]  A. Agresti,et al.  The analysis of ordered categorical data: An overview and a survey of recent developments , 2005 .

[28]  Genshiro Kitagawa,et al.  Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach : Volume 2 Multivariate Statistical Modeling , 1994 .

[29]  David J. Hand,et al.  Mixture Models: Inference and Applications to Clustering , 1989 .

[30]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[31]  Maurizio Vichi,et al.  Two-mode multi-partitioning , 2008, Comput. Stat. Data Anal..

[32]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[33]  Jeroen K. Vermunt,et al.  The Use of Restricted Latent Class Models for Defining and Testing Nonparametric and Parametric Item Response Theory Models , 2001 .

[34]  Richard Breen,et al.  Mixture Models for Ordinal Data , 2010 .

[35]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[36]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[37]  김경민,et al.  Finite mixture models and model-based clustering , 2017 .

[38]  J. Chimka Categorical Data Analysis, Second Edition , 2003 .

[39]  W. D. Ray,et al.  8. Applied Multivariate Data Analysis: Vol. 2, Categorical and Multivariate Methods , 1993 .

[40]  Vera Pawlowsky-Glahn,et al.  Statistical Modeling , 2007, Encyclopedia of Social Network Analysis and Mining.

[41]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[42]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[43]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[44]  Gérard Govaert,et al.  Estimation and selection for the latent block model on categorical data , 2015, Stat. Comput..

[45]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Margarida G. M. S. Cardoso,et al.  Mixture-model cluster analysis using information theoretical criteria , 2007, Intell. Data Anal..

[47]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[48]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[49]  Vichi Maurizio Double k-means Clustering for Simultaneous Classification of Objects and Variables , 2001 .

[50]  E. Snell,et al.  A Scaling Procedure for Ordered Categorical Data , 1964 .

[51]  Brian Everitt,et al.  Cluster analysis , 1974 .

[52]  L. V. Jones,et al.  The measurement and prediction of judgment and choice. , 1970 .

[53]  Jaeil Ahn,et al.  Bayesian inference for the stereotype regression model: Application to a case-control study of prostate cancer. , 2009, Statistics in medicine.

[54]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[55]  J. Anderson Regression and Ordered Categorical Variables , 1984 .

[56]  Marco Alfò,et al.  Advances in Mixture Models , 2007, Comput. Stat. Data Anal..

[57]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[58]  J. Reid Experimental Design and Data Analysis for Biologists , 2003 .

[59]  B. Manly Multivariate Statistical Methods : A Primer , 1986 .

[60]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[61]  Damien McParland,et al.  Clustering Ordinal Data via Latent Variable Models , 2013, Algorithms from and for Nature and Life.

[62]  Yu Hayakawa,et al.  Capture–Recapture Estimation Using Finite Mixtures of Arbitrary Dimension , 2010, Biometrics.

[63]  T. Robbins,et al.  Heterogeneity of Parkinson’s disease in the early clinical stages using a data driven approach , 2005, Journal of Neurology, Neurosurgery & Psychiatry.

[64]  Robert H. Whittaker,et al.  Vegetation of the Great Smoky Mountains , 1956 .

[65]  S. Pledger Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures , 2000, Biometrics.

[66]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[67]  S Greenland,et al.  Alternative models for ordinal logistic regression. , 1994, Statistics in medicine.

[68]  Philip S. Yu,et al.  WF-MSB: A weighted fuzzy-based biclustering method for gene expression data , 2011, Int. J. Data Min. Bioinform..

[69]  A Agresti,et al.  Quasi-symmetric latent class models, with application to rater agreement. , 1993, Biometrics.

[70]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[71]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[72]  Jaeil Ahn,et al.  Fitting stratified proportional odds models by amalgamating conditional likelihoods. , 2008, Statistics in medicine.

[73]  Rebecca A Betensky,et al.  A penalized latent class model for ordinal data. , 2007, Biostatistics.