Time-varying clustering of multivariate longitudinal observations

Abstract We propose a statistical method for clustering multivariate longitudinal data into homogeneous groups. This method relies on a time-varying extension of the classical K-means algorithm, where a multivariate vector autoregressive model is additionally assumed for modeling the evolution of clusters' centroids over time. Model inference is based on a least-squares method and on a coordinate descent algorithm. To illustrate our work, we consider a longitudinal dataset on human development. Three variables are modeled, namely life expectancy, education and gross domestic product.

[1]  Pierpaolo D'Urso Fuzzy C-Means Clustering Models For Multivariate Time-Varying Data: Different Approaches , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[3]  László Kónya,et al.  What Does the Human Development Index Tell us about Convergence? , 2008 .

[4]  H. V. Dijk,et al.  Distribution and Mobility of Wealth of Nations , 1998 .

[5]  Hippu Salk Kristle Nathan,et al.  Progress in human development: Are we on the right path? , 2010 .

[6]  A. Maruotti Mixed Hidden Markov Models for Longitudinal Data: An Overview , 2011 .

[7]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[8]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[9]  Elizabeth Ann Maharaj,et al.  Fuzzy clustering of time series in the frequency domain , 2011, Inf. Sci..

[10]  T. Tarpey Linear Transformations and the k-Means Clustering Algorithm , 2007, American Statistician.

[11]  J. MacKinnon,et al.  Estimation and inference in econometrics , 1994 .

[12]  Fernando A. Quintana,et al.  Model-based clustering for longitudinal data , 2008, Comput. Stat. Data Anal..

[13]  Pierpaolo D'Urso,et al.  A Fuzzy Clustering Model for Multivariate Spatial Time Series , 2010, J. Classif..

[14]  A. Sagar,et al.  The human development index: a critical review , 1998 .

[15]  Arnovst Kom'arek,et al.  Clustering for multivariate continuous and discrete longitudinal data , 2013, 1304.4448.

[16]  Antonello Maruotti,et al.  A Hierarchical Model for Time Dependent Multivariate Longitudinal Data , 2010 .

[17]  G. Molenberghs,et al.  Longitudinal data analysis , 2008 .

[18]  A. Maruotti,et al.  Clustering Multivariate Longitudinal Observations: The Contaminated Gaussian Hidden Markov Model , 2016 .

[19]  Christophe Genolini,et al.  KmL: k-means for longitudinal data , 2010, Comput. Stat..

[20]  Badi H. Baltagi A Companion to Econometric Analysis of Panel Data , 2009 .

[21]  Antonio Ciampi,et al.  Model-Based Clustering of Longitudinal Data: Application to Modeling Disease Course and Gene Expression Trajectories , 2012, Commun. Stat. Simul. Comput..

[22]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[23]  M. B. Cahill Is the Human Development Index Redundant , 2002 .