Faster Functional Clustering via Gaussian Mixture Models

Functional data analysis (FDA) is an important modern paradigm for handling infinite-dimensional data. An important task in FDA is model-based clustering, which organizes functional populations into groups via subpopulation structures. The most common approach for model-based clustering of functional data is via mixtures of linear mixed-effects models. The mixture of linear mixed-effects models (MLMM) approach requires a computationally intensive algorithm for estimation. We provide a novel Gaussian mixture model (GMM) characterization of the model-based clustering problem. We demonstrate that this GMM-based characterization allows for improved computational speeds over the MLMM approach when applied via available functions in the R programming environment. Theoretical considerations for the GMM approach are discussed. An example application to a dataset based upon calcium imaging in the larval zebrafish brain is provided as a demonstration of the effectiveness of the simpler GMM approach.

[1]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[2]  R. C. Bradley Basic Properties of Strong Mixing Conditions , 1985 .

[3]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[4]  Jasper Akerboom,et al.  Optimization of a GCaMP Calcium Indicator for Neural Activity Imaging , 2012, The Journal of Neuroscience.

[5]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[6]  C. Bouveyron,et al.  The discriminative functional mixture model for a comparative analysis of bike sharing systems , 2016, 1601.07999.

[7]  Friedrich Leisch,et al.  Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects , 2010, Bioinform..

[8]  Bertrand Michel,et al.  Slope heuristics: overview and implementation , 2011, Statistics and Computing.

[9]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[10]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[11]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[12]  G. Celeux,et al.  Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments , 2005 .

[13]  Sara van de Geer,et al.  Asymptotic Normality in Mixture Models , 1997 .

[14]  Geoffrey J McLachlan,et al.  Spatial clustering of time series via mixture of autoregressions models and Markov random fields , 2016, 1601.03517.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Marie Frei,et al.  Functional Data Analysis With R And Matlab , 2016 .

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[19]  G. J. McLachlan,et al.  Inference on differences between classes using cluster-specific contrasts of mixed effects. , 2013, Biostatistics.

[20]  Kui Wang,et al.  Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects , 2012, BMC Bioinformatics.

[21]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[22]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[23]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[24]  C. Varin On composite marginal likelihoods , 2008 .

[25]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[26]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[27]  Geoffrey J. McLachlan,et al.  Maximum likelihood estimation of Gaussian mixture models without matrix operations , 2015, Adv. Data Anal. Classif..

[28]  Jonas S. Almeida,et al.  Rényi continuous entropy of DNA sequences. , 2004, Journal of theoretical biology.

[29]  Julien Jacques,et al.  Functional data clustering: a survey , 2013, Advances in Data Analysis and Classification.

[30]  F. Leisch,et al.  FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters , 2008 .

[31]  Julien Jacques,et al.  Model-based clustering for multivariate functional data , 2013, Comput. Stat. Data Anal..

[32]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[33]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[34]  George A. F. Seber,et al.  A matrix handbook for statisticians , 2007 .

[35]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[36]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[37]  Akira Muto,et al.  Prey capture in zebrafish larvae serves as a model to study cognitive functions , 2013, Front. Neural Circuits.

[38]  J. M. Muñoz-Pichardo,et al.  On the consistency of MLE in finite mixture models of exponential families , 2007 .

[39]  M Giacofci,et al.  Wavelet‐Based Clustering for Mixed‐Effects Functional Models in High Dimension , 2011, Biometrics.

[40]  M. Cugmas,et al.  On comparing partitions , 2015 .