Linear Transformations and the k-Means Clustering Algorithm

Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L2 metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.

[1]  C. Abraham,et al.  Unsupervised Curve Clustering using B‐Splines , 2003 .

[2]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[3]  Thaddeus Tarpey,et al.  Clustering Functional Data , 2003, J. Classif..

[4]  Wojtek J. Krzanowski,et al.  Projection Pursuit Clustering for Exploratory Data Analysis , 2003 .

[5]  Henry W. Altland,et al.  Applied Functional Data Analysis , 2003, Technometrics.

[6]  Gilles Pagès,et al.  Functional quantization of Gaussian processes , 2002 .

[7]  P J McGrath,et al.  Predictors of relapse during fluoxetine continuation or maintenance treatment of major depression. , 2000, The Journal of clinical psychiatry.

[8]  Ruben H. Zamar,et al.  Comparing the shapes of regression functions , 2000 .

[9]  Bernard D. Flury,et al.  Allometric Extension , 1999, Biometrics.

[10]  Karen A. F. Copeland A First Course in Multivariate Statistics , 1999 .

[11]  B. Muthén,et al.  Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm , 1999, Biometrics.

[12]  Steven M. Lalonde,et al.  A First Course in Multivariate Statistics , 1997, Technometrics.

[13]  R. Gnanadesikan,et al.  Weighting and selection of variables for cluster analysis , 1995 .

[14]  A. C. Rencher Interpretation of Canonical Discriminant Functions, Canonical Variates, and Principal Components , 1992 .

[15]  P. Green,et al.  A preliminary study of optimal variable weighting in k-means clustering , 1990 .

[16]  G. W. Milligan,et al.  A validation study of a variable weighting algorithm for cluster analysis , 1989 .

[17]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[18]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[19]  Hans-Hermann Bock,et al.  On the Interface between Cluster Analysis, Principal Component Analysis, and Multidimensional Scaling , 1987 .

[20]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .