Vertically Shifted Mixture Models for Clustering Longitudinal Data by Shape

Longitudinal studies play a prominent role in health, social and behavioral sciences as well as in the biological sciences, economics, and marketing. By following subjects over time, temporal changes in an outcome of interest can be directly observed and studied. An important question concerns the existence of distinct trajectory patterns. One way to determine these distinct patterns is through cluster analysis, which seeks to separate objects (subjects, patients, observational units) into homogeneous groups. Many methods have been adapted for longitudinal data, but almost all of them fail to explicitly group trajectories according to distinct pattern shapes. To fulfill the need for clustering based explicitly on shape, we propose vertically shifting the data by subtracting the subject-specific mean directly removes the level prior to fitting a mixture modeling. This non-invertible transformation can result in singular covariance matrixes, which makes mixture model estimation difficult. Despite the challenges, this method outperforms existing clustering methods in a simulation study.

[1]  R. Tremblay,et al.  Developmental trajectories of body mass index in early childhood and their risk factors: an 8-year longitudinal study. , 2011, Archives of pediatrics & adolescent medicine.

[2]  Christophe Genolini,et al.  KmL: k-means for longitudinal data , 2010, Comput. Stat..

[3]  P. McNicholas,et al.  Model‐based clustering of longitudinal data , 2010 .

[4]  Daniel S. Nagin,et al.  Analyzing developmental trajectories: A semiparametric, group-based approach , 1999 .

[5]  Ahlame Douzal Chouakria,et al.  Adaptive dissimilarity index for measuring time series proximity , 2007, Adv. Data Anal. Classif..

[6]  Pierpaolo D'Urso,et al.  Dissimilarity measures for time trajectories , 2000 .

[7]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[8]  R. Eubank Nonparametric Regression and Spline Smoothing , 1999 .

[9]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[10]  D. Ruppert Selecting the Number of Knots for Penalized Splines , 2002 .

[11]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[12]  K. Roeder,et al.  A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories , 2001 .

[13]  L. Wasserman,et al.  Practical Bayesian Density Estimation Using Mixtures of Normals , 1997 .

[14]  Tapabrata Maiti,et al.  Analysis of Longitudinal Data (2nd ed.) (Book) , 2004 .

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  I. J. Schoenberg,et al.  On Pólya frequency functions IV: The fundamental spline functions and their limits , 1966 .

[17]  M. Cox The Numerical Evaluation of B-Splines , 1972 .

[18]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[19]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[20]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[21]  R. Jennrich,et al.  Unbalanced repeated-measures models with structured covariance matrices. , 1986, Biometrics.

[22]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[23]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[26]  C. D. Boor,et al.  Splines as linear combinations of B-splines. A Survey , 1976 .

[27]  C. D. Boor,et al.  On Calculating B-splines , 1972 .

[28]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[29]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[30]  N. Jewell,et al.  The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference: an application to longitudinal modeling , 2013, Statistics in medicine.

[31]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[32]  Pai-Ling Li,et al.  Correlation-Based Functional Clustering via Subspace Projection , 2008 .

[33]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[34]  M. Taljaard,et al.  Trajectories of Childhood Weight Gain: The Relative Importance of Local Environment versus Individual Social and Early Life Factors , 2012, PloS one.