Partitioning subjects based on high-dimensional fMRI data: comparison of several clustering methods and studying the influence of ICA data reduction in big data

In neuroscience, clustering subjects based on brain dysfunctions is a promising avenue to subtype mental disorders as it may enhance the development of a brain-based categorization system for mental disorders that transcends and is biologically more valid than current symptom-based categorization systems. As changes in functional connectivity (FC) patterns have been demonstrated to be associated with various mental disorders, one appealing approach in this regard is to cluster patients based on similarities and differences in FC patterns. To this end, researchers collect three-way fMRI data measuring neural activation over time for different patients at several brain locations and apply Independent Component Analysis (ICA) to extract FC patterns from the data. However, due to the three-way nature and huge size of fMRI data, classical (two-way) clustering methods are inadequate to cluster patients based on these FC patterns. Therefore, a two-step procedure is proposed where, first, ICA is applied to each patient’s fMRI data and, next, a clustering algorithm is used to cluster the patients into homogeneous groups in terms of FC patterns. As some clustering methods used operate on similarity data, the modified RV-coefficient is adopted to compute the similarity between patient specific FC patterns. An extensive simulation study demonstrated that performing ICA before clustering enhances the cluster recovery and that hierarchical clustering using Ward’s method outperforms complete linkage hierarchical clustering, Affinity Propagation and Partitioning Around Medoids. Moreover, the proposed two-step procedure appears to recover the underlying clustering better than (1) a two-step procedure that combines PCA with clustering and (2) Clusterwise SCA-ECP, which performs PCA and clustering in a simultaneous fashion. Additionally, the good performance of the proposed two-step procedure using ICA and Ward’s hierarchical clustering is illustrated in an empirical fMRI data set regarding dementia patients.

[1]  S. Geisser,et al.  On methods in the analysis of profile data , 1959 .

[2]  Eva Ceulemans,et al.  Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. , 2012, Psychological methods.

[3]  Tom F. Wilderjans,et al.  The SIMCLAS Model: Simultaneous Analysis of Coupled Binary Data Matrices with Noise Heterogeneity Between and Within Data Blocks , 2012, Psychometrika.

[4]  S. Rombouts,et al.  Altered resting state networks in mild cognitive impairment and mild Alzheimer's disease: An fMRI study , 2005, Human brain mapping.

[5]  B. Miller,et al.  Neurodegenerative Diseases Target Large-Scale Human Brain Networks , 2009, Neuron.

[6]  J Mazziotta,et al.  A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). , 2001, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[7]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[8]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[9]  Brian Everitt,et al.  Cluster analysis , 1974 .

[10]  S. Rombouts,et al.  Resting-state functional connectivity abnormalities in limbic and salience networks in social anxiety disorder without comorbidity , 2013, European Neuropsychopharmacology.

[11]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[12]  Andrew T. Drysdale,et al.  Resting-state connectivity biomarkers define neurophysiological subtypes of depression , 2016, Nature Medicine.

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  Katherine E. Prater,et al.  Functional connectivity tracks clinical deterioration in Alzheimer's disease , 2012, Neurobiology of Aging.

[16]  Eva Ceulemans,et al.  How to perform multiblock component analysis in practice , 2011, Behavior Research Methods.

[17]  Sidney H. Kennedy,et al.  Anhedonia and Reward-Circuit Connectivity Distinguish Nonresponders from Responders to Dorsomedial Prefrontal Repetitive Transcranial Magnetic Stimulation in Major Depression , 2014, Biological Psychiatry.

[18]  M. Brusco,et al.  The p-median model as a tool for clustering psychological data. , 2010, Psychological methods.

[19]  G. Frisoni,et al.  Functional network disruption in the degenerative dementias , 2011, The Lancet Neurology.

[20]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[21]  Eva Ceulemans,et al.  A clusterwise simultaneous component method for capturing within-cluster differences in component variances and correlations. , 2013, The British journal of mathematical and statistical psychology.

[22]  Ulrich Bodenhofer,et al.  APCluster: an R package for affinity propagation clustering , 2011, Bioinform..

[23]  S. Rombouts,et al.  Resting-state functional MR imaging: a new window to the brain. , 2014, Radiology.

[24]  R. Bakeman Recommended effect size statistics for repeated measures designs , 2005, Behavior research methods.

[25]  José Salvador Sánchez,et al.  Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions , 2009, IbPRIA.

[26]  J. Algina,et al.  Generalized eta and omega squared statistics: measures of effect size for some common research designs. , 2003, Psychological methods.

[27]  P Kuppens,et al.  Clusterwise HICLAS: A generic modeling strategy to trace similarities and differences in multiblock binary data , 2012, Behavior research methods.

[28]  P. Matthews,et al.  Distinct patterns of brain activity in young carriers of the APOE e4 allele , 2009, NeuroImage.

[29]  S. Rombouts,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[30]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[31]  E. Bullmore,et al.  Functional Connectivity and Brain Networks in Schizophrenia , 2010, The Journal of Neuroscience.

[32]  Cinzia Viroli,et al.  Model based clustering for three-way data structures , 2011 .

[33]  J. Berge,et al.  Tucker's congruence coefficient as a meaningful index of factor similarity. , 2006 .

[34]  N. Filippini,et al.  Distinct patterns of brain activity in young carriers of the APOE e4 allele , 2009, NeuroImage.

[35]  E. Rolls,et al.  Cognitive dysfunction in psychiatric disorders: characteristics, causes and the quest for improved therapy , 2012, Nature Reviews Drug Discovery.

[36]  Tülay Adali,et al.  Estimating the number of independent components for functional magnetic resonance imaging data , 2007, Human brain mapping.

[37]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[38]  Tom F. Wilderjans,et al.  Additive Biclustering: A Comparison of One New and Two Existing ALS Algorithms , 2013, Journal of Classification.

[39]  M. Raichle,et al.  Disease and the brain's dark energy , 2010, Nature Reviews Neurology.

[40]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[41]  Yaniv Assaf,et al.  Cluster analysis of resting-state fMRI time series , 2009, NeuroImage.

[42]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  M. Kringelbach,et al.  Great Expectations: Using Whole-Brain Computational Connectomics for Understanding Neuropsychiatric Disorders , 2014, Neuron.

[44]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[45]  Oscar Marín,et al.  Interneuron dysfunction in psychiatric disorders , 2012, Nature Reviews Neuroscience.

[46]  N. Craddock,et al.  The genetics of schizophrenia and bipolar disorder: dissecting psychosis , 2005, Journal of Medical Genetics.

[47]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[48]  Ninon Burgos,et al.  New advances in the Clinica software platform for clinical neuroimaging studies , 2019 .

[49]  A. Franco,et al.  Toward a neuroimaging treatment selection biomarker for major depressive disorder. , 2013, JAMA psychiatry.

[50]  Keith A. Johnson,et al.  Neuronal dysfunction and disconnection of cortical hubs in non-demented subjects with elevated amyloid burden , 2011, Alzheimer's & Dementia.

[51]  F. Happé,et al.  Time to give up on a single explanation for autism , 2006, Nature Neuroscience.

[52]  Christian Hennig,et al.  Cluster-wise assessment of cluster stability , 2007, Comput. Stat. Data Anal..

[53]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[54]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[55]  Stephen M. Smith,et al.  Investigations into resting-state connectivity using independent component analysis , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[56]  Mark W. Woolrich,et al.  Advances in functional and structural MR image analysis and implementation as FSL , 2004, NeuroImage.

[57]  Christian F. Beckmann,et al.  Modelling with independent components , 2012, NeuroImage.

[58]  Terrence J. Sejnowski,et al.  Unsupervised Classification with Non-Gaussian Mixture Models Using ICA , 1998, NIPS.

[59]  Aapo Hyvärinen,et al.  Independent component analysis of fMRI group studies by self-organizing clustering , 2005, NeuroImage.

[60]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[61]  Stephen M Smith,et al.  Correspondence of the brain's functional architecture during activation and rest , 2009, Proceedings of the National Academy of Sciences.

[62]  H. Kiers,et al.  Selecting among three-mode principal component models of different types and complexities: a numerical convex hull based method. , 2006, The British journal of mathematical and statistical psychology.

[63]  M. Greicius,et al.  Default-mode network activity distinguishes Alzheimer's disease from healthy aging: Evidence from functional MRI , 2004, Proc. Natl. Acad. Sci. USA.

[64]  H. Kiers,et al.  Factorial k-means analysis for two-way data , 2001 .

[65]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[66]  Gene H. Golub,et al.  Matrix computations , 1983 .

[67]  Douglas Steinley,et al.  Local optima in K-means clustering: what you don't know may hurt you. , 2003, Psychological methods.

[68]  Bruce N Cuthbert,et al.  The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology , 2014, World psychiatry : official journal of the World Psychiatric Association.

[69]  Michael J. Brusco,et al.  Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction , 2012, Multivariate behavioral research.

[70]  Iven Van Mechelen,et al.  The Local Minima Problem in Hierarchical Classes Analysis: An Evaluation of a Simulated Annealing Algorithm and Various Multistart Procedures , 2007 .

[71]  Reinhold Schmidt,et al.  A comprehensive analysis of resting state fMRI measures to classify individual patients with Alzheimer's disease , 2018, NeuroImage.

[72]  P. Rossini,et al.  Stratified medicine for mental disorders , 2014, European Neuropsychopharmacology.

[73]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[74]  Yves Rosseel,et al.  neuRosim: An R Package for Generating fMRI Data , 2011 .

[75]  Hyunjin Park,et al.  Autism Spectrum Disorder Related Functional Connectivity Changes in the Language Network in Children, Adolescents and Adults , 2017, Front. Hum. Neurosci..

[76]  Stephen M. Smith,et al.  Probabilistic independent component analysis for functional magnetic resonance imaging , 2004, IEEE Transactions on Medical Imaging.

[77]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[78]  Eva Ceulemans,et al.  CHull: A generic convex-hull-based model selection method , 2012, Behavior Research Methods.

[79]  E. Oja,et al.  Independent Component Analysis , 2013 .

[80]  T. Insel,et al.  Wesleyan University From the SelectedWorks of Charles A . Sanislow , Ph . D . 2010 Research Domain Criteria ( RDoC ) : Toward a New Classification Framework for Research on Mental Disorders , 2018 .

[81]  J. Carroll,et al.  K-means clustering in a low-dimensional Euclidean space , 1994 .

[82]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[83]  Tom F. Wilderjans,et al.  The CHIC Model: A Global Model for Coupled Binary Data , 2008 .

[84]  M. Greicius Resting-state functional connectivity in neuropsychiatric disorders , 2008, Current opinion in neurology.

[85]  J. Pekar,et al.  A method for making group inferences from functional MRI data using independent component analysis , 2001, Human brain mapping.

[86]  R. Kessler,et al.  Data-driven subtypes of major depressive disorder: a systematic review , 2012, BMC Medicine.

[87]  Lawrence Hubert,et al.  Advances in Cluster Analysis Relevant to Marketing Research , 1996 .

[88]  Vince D. Calhoun,et al.  A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data , 2009, NeuroImage.

[89]  Michael J. Brusco,et al.  Affinity Propagation and Uncapacitated Facility Location Problems , 2015, Journal of Classification.

[90]  H. Kiers Towards a standardized notation and terminology in multiway analysis , 2000 .

[91]  Lei Guo,et al.  Grouping of brain MR images via affinity propagation , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[92]  Kenji Doya,et al.  Identification of depression subtypes and relevant brain regions using a data-driven approach , 2018, Scientific Reports.

[93]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[94]  M. Brusco,et al.  K-Means Clustering and Mixture Model Clustering: Reply to McLachlan (2011) and Vermunt (2011) , 2011 .

[95]  M. J. van der Laan,et al.  A new partitioning around medoids algorithm , 2003 .

[96]  Vince D. Calhoun,et al.  A method for functional network connectivity among spatially independent resting-state components in schizophrenia , 2008, NeuroImage.

[97]  Michael Berk,et al.  O P I N I O N Open Access , 2022 .

[98]  J. Ranjeva,et al.  Functional connectivity changes differ in early and late‐onset alzheimer's disease , 2014, Human brain mapping.

[99]  M. Brusco Clustering binary data in the presence of masking variables. , 2004, Psychological methods.

[100]  S Makeig,et al.  Analysis of fMRI data by blind separation into independent spatial components , 1998, Human brain mapping.

[101]  Pieter M. Kroonenberg Three‐Mode Clustering , 2007 .

[102]  P. Gorwood,et al.  Depression symptom clusters and their predictive value for treatment outcomes: results from an individual patient data meta-analysis of duloxetine trials. , 2014, Journal of psychiatric research.

[103]  R. Yuste,et al.  Classification of neocortical interneurons using affinity propagation , 2013, Front. Neural Circuits.

[104]  Ilya M. Veer,et al.  Beyond acute social stress: Increased functional connectivity between amygdala and cortical midline structures , 2011, NeuroImage.

[105]  Huafu Chen,et al.  Analysis of activity in fMRI data using affinity propagation clustering , 2011, Computer methods in biomechanics and biomedical engineering.

[106]  M. Fox,et al.  Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging , 2007, Nature Reviews Neuroscience.

[107]  Tom F. Wilderjans,et al.  Clusterwise Parafac to identify heterogeneity in three-way data , 2013 .

[108]  Massimo Filippi,et al.  Brain network connectivity assessed using graph theory in frontotemporal dementia , 2013, Neurology.

[109]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[110]  Matthias Bethge,et al.  Functional analysis of ultra high information rates conveyed by rat vibrissal primary afferents , 2013, Front. Neural Circuits.

[111]  J. P. Hamilton,et al.  Meta-analysis of Functional Neuroimaging of Major Depressive Disorder in Youth. , 2015, JAMA psychiatry.

[112]  J. Andrews-Hanna,et al.  Large-Scale Network Dysfunction in Major Depressive Disorder: A Meta-analysis of Resting-State Functional Connectivity. , 2015, JAMA psychiatry.

[113]  Christopher S. Monk,et al.  Alterations of resting state functional connectivity in the default network in adolescents with autism spectrum disorders , 2010, Brain Research.

[114]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[115]  Eva Ceulemans,et al.  Factorial and reduced K-means reconsidered , 2010, Comput. Stat. Data Anal..

[116]  Jeroen van der Grond,et al.  Resting state functional connectivity differences between behavioral variant frontotemporal dementia and Alzheimer's disease , 2015, Front. Hum. Neurosci..

[117]  R. Mojena,et al.  Hierarchical Grouping Methods and Stopping Rules: An Evaluation , 1977, Comput. J..

[118]  Conor Liston,et al.  Default Mode Network Mechanisms of Transcranial Magnetic Stimulation in Depression , 2014, Biological Psychiatry.

[119]  Tülay Adali,et al.  Comparison of multi‐subject ICA methods for analysis of fMRI data , 2010, Human brain mapping.

[120]  T. Insel,et al.  Brain disorders? Precisely , 2015, Science.