How to Avoid Spurious Cluster Validation? A Methodological Investigation on Simulated and fMRI Data

This paper presents an evaluation of a common approach that has been considered as a promising option for exploratory fMRI data analyses. The approach includes two stages: creating from the data a sequence of partitions with increasing number of subsets (clustering) and selecting the one partition in this sequence that exhibits the clearest indications of an existing structure (cluster validation). In order to achieve that the selected partition is actually the best characterization of the data structure, previous studies were directed to find the most appropriate validity function(s). In our analysis protocol, we first optimize the sequence of partitions according to the given objective function. Our study showed that an insufficient optimization of the partition, for one or more numbers of clusters, can easily yield a spurious validation result which, in turn, may lead the analyst to a misleading interpretation of the fMRI experiment. However, a sufficient optimization, for each included number of clusters, provided the basis for a reliable, adequate characterization of the data Furthermore, it enabled an adequate evaluation of the validity functions. These findings were obtained independently for three clustering algorithms (representing the hard and fuzzy clustering variant) and three up-to-date cluster validity functions. The findings were derived from analyses of Gaussian clusters, simulated data sets that mimic typical fMRI response signals, andreal fMRI data. Based on our results we propose a number of options of how to configure improved clustering tools.

[1]  N J Pizzi,et al.  EvIdent(TM): a functional magnetic resonance image analysis system , 2001, Artif. Intell. Medicine.

[2]  Richard S. J. Frackowiak,et al.  The neural correlates of the verbal component of working memory , 1993, Nature.

[3]  Markus Svensén,et al.  ICA of fMRI Group Study Data , 2002, NeuroImage.

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Stephen J. Uftring,et al.  Detecting Brain Activation in FMRI Data without Prior Knowledge of Mental Event Timing , 2000, NeuroImage.

[6]  X Hu,et al.  Analysis of functional magnetic resonance imaging data using self‐organizing mapping with spatial connectivity , 1999, Magnetic resonance in medicine.

[7]  A. Andersen,et al.  Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. , 1999, Magnetic resonance imaging.

[8]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[9]  Colin M. Brown,et al.  The Neural Circuitry Involved in the Reading of German Words and Pseudowords: A PET Study , 1999, Journal of Cognitive Neuroscience.

[10]  Axel Wismüller,et al.  Cluster Analysis of Biomedical Image Time-Series , 2002, International Journal of Computer Vision.

[11]  Tianzi Jiang,et al.  Global optimization approaches to MEG source localization , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[12]  R. L. Somorjai,et al.  Select before you fuzzy cluster: Detecting potential fMRI activations using a spectral peak measure , 2000, NeuroImage.

[13]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[14]  Ulrich Möller,et al.  Pitfalls in the Clustering of Neuroimage Data and Improvements by Global Optimization Strategies , 2001, NeuroImage.

[15]  Griff L. Bilbro Fast stochastic global optimization , 1993, Optics & Photonics.

[16]  Richard Baumgartner,et al.  A new statistical inference test for fMRI time-series , 2001, NeuroImage.

[17]  Giampietro Tecchiolli,et al.  On random minimization of functions , 2004, Biological Cybernetics.

[18]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[20]  J. Sergent,et al.  Positron emission tomography study of letter and object processing: empirical findings and methodological considerations. , 1992, Cerebral cortex.

[21]  S. Bookheimer,et al.  Regional cerebral blood flow during object naming and word reading , 1995 .

[22]  Noureddine Zahid,et al.  A new cluster-validity for fuzzy clustering , 1999, Pattern Recognit..

[23]  A. Boudraa Dynamic estimation of number of clusters in data sets , 1999 .

[24]  Stephen J. Roberts,et al.  Maximum certainty data partitioning , 2000, Pattern Recognit..

[25]  Nasser Kehtarnavaz,et al.  Determining number of clusters and prototype locations via multi-scale clustering , 1998, Pattern Recognit. Lett..

[26]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[27]  Sim Heng Ong,et al.  On post-clustering evaluation and modification , 2000, Pattern Recognit. Lett..

[28]  Dong-Jo Park,et al.  A Novel Validity Index for Determination of the Optimal Number of Clusters , 2001 .

[29]  Lin Yu Tseng,et al.  A genetic clustering algorithm for data with non-spherical-shape clusters , 2000, Pattern Recognit..

[30]  Olli Nevalainen,et al.  Tabu search algorithm for codebook generation in vector quantization , 1998, Pattern Recognit..

[31]  R Baumgartner,et al.  A hierarchical clustering method for analyzing functional MR images. , 1999, Magnetic resonance imaging.

[32]  Mohamed-Jalal Fadili,et al.  On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series , 2001, Medical Image Anal..

[33]  M E Meyerand,et al.  Combining independent component analysis and correlation analysis to probe interregional connectivity in fMRI task activation datasets. , 2000, Magnetic resonance imaging.

[34]  Herbert Witte,et al.  An efficient vector quantizer providing globally optimal solutions , 1998, IEEE Trans. Signal Process..

[35]  Yossef Steinberg,et al.  A comparison of cluster validity criteria for a mixture of normal distributed data , 2000, Pattern Recognit. Lett..

[36]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[37]  K. Uğurbil,et al.  Functional magnetic resonance imaging of the human brain , 1997, Journal of Neuroscience Methods.

[38]  G. Rees Statistical Parametric Mapping , 2004, Practical Neurology.

[39]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[40]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[41]  Michael Erb,et al.  Dynamical Cluster Analysis of Cortical fMRI Activation , 1999, NeuroImage.

[42]  Henrik Walter,et al.  Detection of delay selective activity during a working memory task by explorative data analysis , 2001, NeuroImage.

[43]  Y. Fukuyama,et al.  A new method of choosing the number of clusters for the fuzzy c-mean method , 1989 .

[44]  William Equitz,et al.  A new vector quantization clustering algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[45]  Michael P. Windham,et al.  Cluster Validity for the Fuzzy c-Means Clustering Algorithrm , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Khaled S. Al-Sultan,et al.  A tabu search-based algorithm for the fuzzy clustering problem , 1997, Pattern Recognit..

[47]  Wen-Jyi Hwang,et al.  Variable-rate vector quantizer design using genetic algorithm , 1998 .

[48]  James C. Bezdek,et al.  Two soft relatives of learning vector quantization , 1995, Neural Networks.

[49]  Noureddine Zahid,et al.  Unsupervised fuzzy clustering , 1999, Pattern Recognit. Lett..

[50]  R Baumgartner,et al.  Comparison of two exploratory data analysis methods for fMRI: fuzzy clustering vs. principal component analysis. , 2000, Magnetic resonance imaging.

[51]  Soon-H. Kwon Cluster validity index for fuzzy clustering , 1998 .

[52]  U. Frith,et al.  Explicit and implicit processing of words and pseudowords by adult developmental dyslexics: A search for Wernicke's Wortschatz? , 1999, Brain : a journal of neurology.

[53]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[54]  S. Ruan,et al.  On the number of clusters and the fuzziness index for unsupervised FCA of BOLD fMRI time series , 2000, NeuroImage.

[55]  Kai-Hsiang Chuang,et al.  Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means , 1999, IEEE Transactions on Medical Imaging.

[56]  J Hennig,et al.  Neural network‐based analysis of MR time series , 1999, Magnetic resonance in medicine.

[57]  S. Lai,et al.  A novel local PCA-Based method for detecting activation signals in fMRI , 1999 .

[58]  Joachim M. Buhmann,et al.  A theory of proximity based clustering: structure detection by optimization , 2000, Pattern Recognit..

[59]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.