A Comparison of Hierarchical Methods for Clustering Functional Data

Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Comparing the clustering methods for the real data set confirmed the findings of the simulation. This study yields concrete suggestions to future researchers to determine the best method for clustering their functional data.

[1]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[2]  F. Baker Stability of Two Hierarchical Grouping Techniques Case I: Sensitivity to Data Errors , 1974 .

[3]  T. Tarpey Linear Transformations and the k-Means Clustering Algorithm , 2007, American Statistician.

[4]  Michael L. Stein,et al.  Locally lattice sampling designs for isotropic random fields , 1995 .

[5]  L. Hubert Approximate Evaluation Techniques for the Single-Link and Complete-Link Hierarchical Clustering Procedures , 1974 .

[6]  C. Edelbrock Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody. , 1979, Multivariate behavioral research.

[7]  Louis L. McQuitty,et al.  Hierarchical Linkage Analysis for the Isolation of Types , 1960 .

[8]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[11]  L. Fisher,et al.  391: A Monte Carlo Comparison of Six Clustering Procedures , 1975 .

[12]  Matthijs J. Warrens,et al.  On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index , 2008, J. Classif..

[13]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[14]  H. Ross Principles of Numerical Taxonomy , 1964 .

[15]  John C. Ogilvie,et al.  Evaluation of hierarchical grouping techniques; a preliminary study , 1972, Comput. J..

[16]  L. Wasserman,et al.  CATS , 2005 .

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[20]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[21]  Jon R. Kettenring,et al.  The Practice of Cluster Analysis , 2006, J. Classif..

[22]  George Casella,et al.  The effect of pre-smoothing functional data on cluster analysis , 2007 .

[23]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[24]  Manfred Schwaiger,et al.  On the Performance of Algorithms for Two-Mode Hierarchical Cluster Analysis - Results from a Monte Carlo Simulation Study , 2005, Data Analysis and Decision Support.

[25]  Chuan Zhou,et al.  Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions , 2003 .

[26]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[27]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. Everitt,et al.  A Monte Carlo Study of the Recovery of Cluster Structure in Binary Data by Hierarchical Clustering Techniques. , 1987, Multivariate behavioral research.

[29]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[32]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[33]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .