Co-clustering of Time-Dependent Data via the Shape Invariant Model

Multivariate time-dependent data, where multiple features are observed over time for a set of individuals, are increasingly widespread in many application domains. To model these data, we need to account for relations among both time instants and variables and, at the same time, for subject heterogeneity. We propose a new co-clustering methodology for grouping individuals and variables simultaneously, designed to handle both functional and longitudinal data. Our approach borrows some concepts from the curve registration framework by embedding the shape invariant model in the latent block model , estimated via a suitable modification of the SEM-Gibbs algorithm. The resulting procedure allows for several user-defined specifications of the notion of cluster that can be chosen on substantive grounds and provides parsimonious summaries of complex time-dependent data by partitioning data matrices into homogeneous blocks. Along with the explicit modelling of time evolution, these aspects allow for an easy interpretation of the clusters, from which also low-dimensional settings may benefit.

[1]  E. Erosheva,et al.  Modeling Criminal Careers as Departures From a Unimodal Population Age–Crime Curve: The Case of Marijuana Use , 2012, Journal of the American Statistical Association.

[2]  Charles Bouveyron,et al.  Co-Clustering of Ordinal Data via Latent Continuous Random Variables and Not Missing at Random Entries , 2020 .

[3]  C. Bouveyron,et al.  The discriminative functional mixture model for a comparative analysis of bike sharing systems , 2016, 1601.07999.

[4]  Richard Paap,et al.  A Bayesian approach to two-mode clustering , 2009 .

[5]  Jeffrey R. Harring,et al.  A Comparison of Estimation Methods for Nonlinear Mixed-Effects Models Under Model Misspecification and Data Sparseness: A Simulation Study , 2016 .

[6]  Seung Jun Shin,et al.  Model-Based Clustering and Classification for Data Science: With Applications in R , 2020, The American Statistician.

[7]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  L. Hubert,et al.  Comparing partitions , 1985 .

[10]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[11]  P. Diggle Analysis of Longitudinal Data , 1995 .

[12]  Julien Jacques,et al.  Model-based co-clustering for mixed type data , 2020, Comput. Stat. Data Anal..

[13]  Hsing,et al.  Functional Data Analysis , 2015 .

[14]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[15]  M. Lindstrom,et al.  Self-modelling with random shift and scale parameters and a free-knot spline shape function. , 1995, Statistics in medicine.

[16]  Julien Jacques,et al.  Model-based co-clustering for functional data , 2016, Neurocomputing.

[17]  Nial Friel,et al.  Inferring structure in bipartite networks using the latent blockmodel and exact ICL , 2014, Network Science.

[18]  T. Gasser,et al.  Convergence and consistency results for self-modeling nonlinear regression , 1988 .

[19]  E. Erosheva,et al.  Breaking Bad: Two Decades of Life-Course Data Analysis in Criminology, Developmental Psychology, and Beyond , 2014 .

[20]  Charles Bouveyron,et al.  Model-based clustering of time series in group-specific functional subspaces , 2011, Adv. Data Anal. Classif..

[21]  Gérard Govaert,et al.  Estimation and selection for the latent block model on categorical data , 2015, Stat. Comput..

[22]  John A. Rice,et al.  FUNCTIONAL AND LONGITUDINAL DATA ANALYSIS: PERSPECTIVES ON SMOOTHING , 2004 .

[23]  Maud Delattre,et al.  A note on BIC in mixed-effects models , 2014 .

[24]  Julien Jacques,et al.  Functional data clustering: a survey , 2013, Advances in Data Analysis and Classification.

[25]  T. Hale,et al.  Oxford COVID-19 Government Response Tracker , 2020 .

[26]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[27]  Valerie Robert,et al.  Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index , 2017, J. Classif..

[28]  Julien Jacques,et al.  Model-based co-clustering for ordinal data , 2017, Comput. Stat. Data Anal..

[29]  E. A. Sylvestre,et al.  Self Modeling Nonlinear Regression , 1972 .

[30]  G. Govaert,et al.  Latent Block Model for Contingency Table , 2010 .

[31]  Fernando A. Quintana,et al.  Model-based clustering for longitudinal data , 2008, Comput. Stat. Data Anal..

[32]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[33]  J. Ramsay,et al.  Curve registration , 2018, Oxford Handbooks Online.

[34]  M. Cugmas,et al.  On comparing partitions , 2015 .

[35]  C. Bouveyron,et al.  Co-clustering of multivariate functional data for the analysis of air pollution in the South of France , 2020, The Annals of Applied Statistics.

[36]  D. Bates,et al.  Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model , 1995 .

[37]  Cinzia Viroli,et al.  Finite mixtures of matrix normal distributions for classifying three-way data , 2011, Stat. Comput..

[38]  C. Viroli,et al.  Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data , 2014, 1401.1301.

[39]  Nial Friel,et al.  Block clustering with collapsed latent block models , 2010, Statistics and Computing.

[40]  S. Bhatt,et al.  Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe , 2020, Nature.

[41]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[42]  Aurore Lomet,et al.  Sélection de modèle pour la classification croisée de données continues , 2013 .

[43]  Gérard Govaert,et al.  Co-Clustering: Models, Algorithms and Applications , 2013 .

[44]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[45]  Cinzia Viroli,et al.  Model based clustering for three-way data structures , 2011 .

[46]  Y. Teh,et al.  Inferring the effectiveness of government interventions against COVID-19 , 2020, Science.

[47]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[49]  Charles Bouveyron,et al.  The functional latent block model for the co‐clustering of electricity consumption curves , 2018 .

[50]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[51]  D. Bates,et al.  Nonlinear mixed effects models for repeated measures data. , 1990, Biometrics.

[52]  Sylvia Frühwirth-Schnatter,et al.  Panel data analysis: a survey on model-based clustering of time series , 2011, Adv. Data Anal. Classif..

[53]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[54]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[55]  Gilles Celeux,et al.  The Latent Block Model: a useful model for high dimensional data , 2017 .

[56]  Lurdes Y. T. Inoue,et al.  Bayesian Hierarchical Curve Registration , 2008 .

[57]  C. Anderson‐Cook,et al.  Group-Based Modeling of Development , 2006 .

[58]  Charles Bouveyron,et al.  Model-Based Clustering and Classification for Data Science: With Applications in R , 2019 .

[59]  Marco Corneli,et al.  A Bayesian approach for clustering and exact finite-sample model selection in longitudinal data mixtures , 2020 .