Intraclass correlation: improved modeling approaches and applications for neuroimaging

Intraclass correlation (ICC) is a reliability metric that gauges similarity when, for example, entities are measured under similar, or even the same, well‐controlled conditions, which in MRI applications include runs/sessions, twins, parent/child, scanners, sites, and so on. The popular definitions and interpretations of ICC are usually framed statistically under the conventional ANOVA platform. Here, we provide a comprehensive overview of ICC analysis in its prior usage in neuroimaging, and we show that the standard ANOVA framework is often limited, rigid, and inflexible in modeling capabilities. These intrinsic limitations motivate several improvements. Specifically, we start with the conventional ICC model under the ANOVA platform, and extend it along two dimensions: first, fixing the failure in ICC estimation when negative values occur under degenerative circumstance, and second, incorporating precision information of effect estimates into the ICC model. These endeavors lead to four modeling strategies: linear mixed‐effects (LME), regularized mixed‐effects (RME), multilevel mixed‐effects (MME), and regularized multilevel mixed‐effects (RMME). Compared to ANOVA, each of these four models directly provides estimates for fixed effects and their statistical significances, in addition to the ICC estimate. These new modeling approaches can also accommodate missing data and fixed effects for confounding variables. More importantly, we show that the MME and RMME approaches offer more accurate characterization and decomposition among the variance components, leading to more robust ICC computation. Based on these theoretical considerations and model performance comparisons with a real experimental dataset, we offer the following general‐purpose recommendations. First, ICC estimation through MME or RMME is preferable when precision information (i.e., weights that more accurately allocate the variances in the data) is available for the effect estimate; when precision information is unavailable, ICC estimation through LME or the RME is the preferred option. Second, even though the absolute agreement version, ICC(2,1), is presently more popular in the field, the consistency version, ICC(3,1), is a practical and informative choice for whole‐brain ICC analysis that achieves a well‐balanced compromise when all potential fixed effects are accounted for. Third, approaches for clear, meaningful, and useful result reporting in ICC analysis are discussed. All models, ICC formulations, and related statistical testing methods have been implemented in an open source program 3dICC, which is publicly available as part of the AFNI suite. Even though our work here focuses on the whole‐brain level, the modeling strategy and recommendations can be equivalently applied to other situations such as voxel, region, and network levels.

[1]  Hernando Ombao,et al.  Quantifying temporal correlations: A test–retest evaluation of functional connectivity in resting-state fMRI , 2013, NeuroImage.

[2]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[3]  Mark W. Woolrich,et al.  Multilevel linear modelling for FMRI group analysis using Bayesian inference , 2004, NeuroImage.

[4]  Paul A. Taylor,et al.  Untangling the relatedness among correlations, Part II: Inter-subject correlation group analysis through linear mixed-effects modeling , 2017, NeuroImage.

[5]  Ann-Christine Ehlis,et al.  Event-related functional near-infrared spectroscopy (fNIRS): Are the measurements reliable? , 2006, NeuroImage.

[6]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[7]  ekmolloy FMRI_test-retest_reliability: Tools for test-retest FMRI studies , 2014 .

[8]  M. M. Richter,et al.  Event‐related functional near‐infrared spectroscopy (fNIRS) based on craniocerebral correlations: Reproducibility of activation? , 2007, Human brain mapping.

[9]  Xi-Nian Zuo,et al.  Individual Variability and Test-Retest Reliability Revealed by Ten Repeated Resting-State Brain Scans over One Month , 2015, PloS one.

[10]  Bharat B. Biswal,et al.  The oscillating brain: Complex and reliable , 2010, NeuroImage.

[11]  Shrikanth Narayanan,et al.  Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. , 2017, The Journal of the Acoustical Society of America.

[12]  Katharina Kircanski,et al.  Intraclass correlation: improved modeling approaches and applications for neuroimaging , 2017 .

[13]  Wolfgang Viechtbauer,et al.  Conducting Meta-Analyses in R with the metafor Package , 2010 .

[14]  W. Revelle psych: Procedures for Personality and Psychological Research , 2017 .

[15]  Peter Uhlhaas,et al.  Test–retest reliability of the magnetic mismatch negativity response to sound duration and omission deviants , 2017, NeuroImage.

[16]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[17]  Yong He,et al.  A connectivity-based test-retest dataset of multi-modal magnetic resonance imaging in young healthy adults , 2015, Scientific Data.

[18]  Alan C. Evans,et al.  A General Statistical Analysis for fMRI Data , 2000, NeuroImage.

[19]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[20]  Yufeng Zang,et al.  Toward reliable characterization of functional homogeneity in the human brain: Preprocessing, scan duration, imaging resolution and computational space , 2013, NeuroImage.

[21]  Ellen Leibenluft,et al.  Behavioral and neural stability of attention bias to threat in healthy adolescents , 2016, NeuroImage.

[22]  Thomas E. Nichols,et al.  Variance decomposition for single-subject task-based fMRI activity estimates across many sessions , 2017, NeuroImage.

[23]  Theo G. M. van Erp,et al.  Multisite reliability of MR-based functional connectivity , 2017, NeuroImage.

[24]  R. Riener,et al.  Test-retest reliability of fMRI experiments during robot-assisted active and passive stepping , 2015, Journal of NeuroEngineering and Rehabilitation.

[25]  M. B. Nebel,et al.  Quantifying the reliability of image replication studies: The image intraclass correlation coefficient (I2C2) , 2013, Cognitive, affective & behavioral neuroscience.

[26]  Kimberly S. Mapes,et al.  Test-retest assessment of cortical activation induced by repetitive transcranial magnetic stimulation with brain atlas-guided optical topography , 2012, Journal of biomedical optics.

[27]  Raimi L. Quiton,et al.  Intersession reliability of fMRI activation for heat pain and motor tasks , 2014, NeuroImage: Clinical.

[28]  Andreas Heinz,et al.  Test–retest reliability of resting-state connectivity network characteristics using fMRI and graph theoretical measures , 2012, NeuroImage.

[29]  Oliver Grimm,et al.  Test–retest reliability of fMRI-based graph theoretical properties during working memory, emotion processing, and resting state , 2014, NeuroImage.

[30]  Xi-Nian Zuo,et al.  Reliable intrinsic connectivity networks: Test–retest evaluation using ICA and dual regression approach , 2010, NeuroImage.

[31]  Adam Gazzaley,et al.  Reliability measures of functional magnetic resonance imaging in a longitudinal evaluation of mild cognitive impairment , 2014, NeuroImage.

[32]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[33]  Gang Chen,et al.  Reliability of neural activation and connectivity during implicit face emotion processing in youth , 2018, Developmental Cognitive Neuroscience.

[34]  Michael B. Miller,et al.  How reliable are the results from functional magnetic resonance imaging? , 2010, Annals of the New York Academy of Sciences.

[35]  Yong He,et al.  Graph Theoretical Analysis of Functional Brain Networks: Test-Retest Evaluation on Short- and Long-Term Resting-State Functional MRI Data , 2011, PloS one.

[36]  Alan C. Evans,et al.  A general statistical analysis for fMRI data , 2000, NeuroImage.

[37]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[38]  Mary L. Phillips,et al.  Model Specification and the Reliability of fMRI Results: Implications for Longitudinal Neuroimaging Studies in Psychiatry , 2014, PloS one.

[39]  X. Zuo,et al.  Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics: A systems neuroscience perspective , 2014, Neuroscience & Biobehavioral Reviews.

[40]  Gerald Langner,et al.  The oscillating brain , 2015 .

[41]  Simon B. Eickhoff,et al.  One-year test–retest reliability of intrinsic connectivity network fMRI in older adults , 2012, NeuroImage.

[42]  L. Shah,et al.  Reliability and reproducibility of individual differences in functional connectivity acquired during task and resting state , 2016, Brain and behavior.

[43]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[44]  Jennifer C. Britton,et al.  Linear mixed-effects modeling approach to FMRI group analysis , 2013, NeuroImage.

[45]  Steven C. R. Williams,et al.  Measuring fMRI reliability with the intra-class correlation coefficient , 2009, NeuroImage.

[46]  Ellen Leibenluft,et al.  Applications of multivariate modeling to neuroimaging group analysis: A comprehensive alternative to univariate general linear model , 2014, NeuroImage.

[47]  Sophia Rabe-Hesketh,et al.  A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models , 2013, Psychometrika.

[48]  Michael S. Beauchamp,et al.  FMRI group analysis combining effect estimates and their variances , 2012, NeuroImage.

[49]  Tilo Kircher,et al.  Test-Retest Reliability of fMRI Brain Activity during Memory Encoding , 2013, Front. Psychiatry.

[50]  Han Zhang,et al.  Test–retest assessment of independent component analysis-derived resting-state functional connectivity based on functional near-infrared spectroscopy , 2011, NeuroImage.

[51]  Mathijs Raemaekers,et al.  Task and task‐free FMRI reproducibility comparison for motor network identification , 2014, Human brain mapping.

[52]  Ludovica Griffanti,et al.  Challenges in the reproducibility of clinical studies with resting state fMRI: An example in early Parkinson's disease , 2016, NeuroImage.

[53]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[54]  Yagesh Bhambhani,et al.  Reliability of near-infrared spectroscopy measures of cerebral oxygenation and blood volume during handgrip exercise in nondisabled and traumatic brain-injured subjects. , 2006, Journal of rehabilitation research and development.

[55]  Bing Chen,et al.  An open science resource for establishing reliability and reproducibility in functional connectomics , 2014, Scientific Data.

[56]  Yufeng Zang,et al.  DPABI: Data Processing & Analysis for (Resting-State) Brain Imaging , 2016, Neuroinformatics.

[57]  Paul A. Taylor,et al.  Is the statistic value all we should care about in neuroimaging? , 2016, NeuroImage.