Model-Based Co-Clustering of Multivariate Functional Data

High dimensional data clustering is an increasingly interesting topic in the statistical analysis of heterogeneous large-scale data. In this paper, we consider the problem of clustering heterogeneous high-dimensional data where the individuals are described by functional variables which exhibit a dynamical longitudinal structure. We address the issue in the framework of model-based co-clustering and propose the functional latent block model (FLBM). The introduced FLBM model allows to simultaneously cluster a sample of multivariate functions into a finite set of blocks, each block being an association of a cluster over individuals and a cluster over functional variables. Furthermore, the homogeneous set within each block is modeled with a dedicated latent process functional regression model which allows its segmentation according to an underlying dynamical structure. The proposed model allows thus to fully exploit the structure of the data, compared to classical latent block clustering models for continuous non functional data, which ignores the functional structure of the observations. The FLBM can therefore serve for simultaneous co-clustering and segmentation of multivariate non-stationary functions. We propose a variational expectation-maximization (EM) algorithm (VEM-FLBM) to monotonically maximize a variational approximation of the observed-data log-likelihood for the unsupervised inference of the FLBM model.

[1]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[2]  Julien Jacques,et al.  Model-based clustering for multivariate functional data , 2013, Comput. Stat. Data Anal..

[3]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[4]  Gareth M. James,et al.  Functional linear discriminant analysis for irregularly sampled curves , 2001 .

[5]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[6]  Allou Samé,et al.  Model-based clustering and segmentation of time series with changes in regime , 2011, Adv. Data Anal. Classif..

[7]  Mohamed Nadif,et al.  Fuzzy clustering to estimate the parameters of block mixture models , 2006, Soft Comput..

[8]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[9]  Emilie Devijver,et al.  Model-based clustering for high-dimension data. Application to functional data. , 2014 .

[10]  Gérard Govaert,et al.  Model selection for the binary latent block model , 2012 .

[11]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[12]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[13]  Allou Samé,et al.  A hidden process regression model for functional data description. Application to curve discrimination , 2010, Neurocomputing.

[14]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[15]  Aurore Lomet,et al.  Sélection de modèle pour la classification croisée de données continues , 2013 .

[16]  Allou Samé,et al.  Time series modeling by a regression approach based on a latent process , 2009, Neural Networks.

[17]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[18]  Xueli Liu,et al.  Simultaneous curve registration and clustering for functional data , 2009, Comput. Stat. Data Anal..

[19]  Gérard Govaert,et al.  Estimation and selection for the latent block model on categorical data , 2015, Stat. Comput..

[20]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[21]  Padhraic Smyth,et al.  Joint Probabilistic Curve Clustering and Alignment , 2004, NIPS.

[22]  Charles Bouveyron,et al.  Model-based clustering of time series in group-specific functional subspaces , 2011, Adv. Data Anal. Classif..

[23]  Julien Jacques,et al.  Model-based co-clustering for functional data , 2016, Neurocomputing.

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .