Panel data analysis: a survey on model-based clustering of time series

Clustering is a widely used statistical tool to determine subsets in a given data set. Frequently used clustering methods are mostly based on distance measures and cannot easily be extended to cluster time series within a panel or a longitudinal data set. The paper reviews recently suggested approaches to model-based clustering of panel or longitudinal data based on finite mixture models. Several approaches are considered that are suitable both for continuous and for categorical time series observations. Bayesian estimation through Markov chain Monte Carlo methods is described in detail and various criteria to select the number of clusters are reviewed. An application to a panel of marijuana use among teenagers serves as an illustration.

[1]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[2]  Hongtu Zhu,et al.  Hypothesis testing in mixture regression models , 2004 .

[3]  Joseph B. Lang,et al.  Association-Marginal Modeling of Multivariate Categorical Responses: A Maximum Likelihood Approach , 1999 .

[4]  Wayne S. DeSarbo,et al.  Bayesian inference for finite mixtures of generalized linear models with random effects , 2000 .

[5]  S. Frühwirth-Schnatter,et al.  Model-based clustering of categorical time series , 2010 .

[6]  Ernst Wit,et al.  Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models , 2010, Stat. Comput..

[7]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 1997 .

[8]  G. Celeux,et al.  Exact and Monte Carlo calculations of integrated likelihoods for the latent class model , 2010 .

[9]  Brian Everitt,et al.  Cluster analysis , 1974 .

[10]  Agostino Nobile,et al.  On the posterior distribution of the number of components in a finite mixture , 2004, math/0503673.

[11]  S. Frühwirth-Schnatter,et al.  Bayesian Analysis of the Heterogeneity Model , 2004 .

[12]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[13]  Miguel A. Juárez,et al.  Model-Based Clustering of Non-Gaussian Panel Data Based on Skew-t Distributions , 2010 .

[14]  Jeroen K. Vermunt,et al.  Longitudinal Research Using Mixture Models , 2010 .

[15]  Sylvia Kaufmann,et al.  Model-Based Clustering of Multiple Time Series , 2004 .

[16]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[17]  W. Gilks Markov Chain Monte Carlo , 2005 .

[18]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[19]  Paul R. Cohen,et al.  Bayesian Clustering by Dynamics Contents 1 Introduction 1 2 Clustering Markov Chains 2 , 2022 .

[20]  Luis Angel García-Escudero,et al.  A review of robust clustering methods , 2010, Adv. Data Anal. Classif..

[21]  S. Frühwirth-Schnatter Estimating Marginal Likelihoods for Mixture and Markov Switching Models Using Bridge Sampling Techniques , 2004 .

[22]  Friedrich Leisch,et al.  Identifiability of Finite Mixtures of Multinomial Logit Models with Varying and Fixed Effects , 2008, J. Classif..

[23]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[24]  Jeffrey M. Wooldridge,et al.  The Initial Conditions Problem in Dynamic, Nonlinear Panel Data Models with Unobserved Heterogeneity , 2002 .

[25]  Gerhard Tutz,et al.  Statistical modelling and regression structures : festschrift in honour of Ludwig Fahrmeir , 2010 .

[26]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[28]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[29]  Halina Frydman Estimation in the Mixture of Markov Chains Moving With Different Speeds , 2003 .

[30]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[31]  P. McNicholas,et al.  Model‐based clustering of longitudinal data , 2010 .

[32]  Fabio Canova,et al.  Testing for Convergence Clubs in Income Per Capita: A Predictive Density Approach , 2004 .

[33]  Michael I. Jordan,et al.  Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[34]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[35]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[36]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[37]  Charles Bouveyron,et al.  Model-based clustering of time series in group-specific functional subspaces , 2011, Adv. Data Anal. Classif..

[38]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[39]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[40]  S. Frühwirth-Schnatter,et al.  Labor Market Entry and Earnings Dynamics: Bayesian Inference Using Mixtures-of-Experts Markov Chain Clustering , 2012 .

[41]  H. Akaike A new look at the statistical model identification , 1974 .

[42]  Christian Aßmann,et al.  A Bayesian approach to model-based clustering for binary panel probit models , 2011, Comput. Stat. Data Anal..

[43]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[44]  Cheng Hsiao,et al.  Analysis of Panel Data , 1987 .

[45]  Luc Bauwens,et al.  Bayesian Clustering of Many Garch Models , 2003 .

[46]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[47]  How do changes in monetary policy affect bank lending? An analysis of Austrian bank data , 2006 .

[48]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[49]  Denis Fougère,et al.  Bayesian Inference for the Mover-Stayer Model in Continuous-Time , 2003 .

[50]  Ann L. Owen,et al.  Do all countries follow the same growth process? , 2007 .

[51]  D. Binder Bayesian cluster analysis , 1978 .

[52]  B. Everitt Unresolved Problems in Cluster Analysis , 1979 .

[53]  David A. Hensher,et al.  A latent class model for discrete choice analysis: contrasts with mixed logit , 2003 .

[54]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[55]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing: Rossi/Bayesian Statistics and Marketing , 2006 .

[56]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[57]  S. Frühwirth-Schnatter,et al.  Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. , 2010, Biostatistics.

[58]  D. Stephens,et al.  A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes , 2006 .

[59]  Dick J. C. van Dijk,et al.  Structural differences in economic growth: an endogenous clustering approach , 2012 .

[60]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[61]  S. Frühwirth-Schnatter,et al.  Data Augmentation and MCMC for Binary and Multinomial Logit Models , 2010 .

[62]  Sylvia Frühwirth-Schnatter,et al.  Dealing with Label Switching under Model Uncertainty , 2011 .