Pooling Morphometric Estimates: A Statistical Equivalence Approach

Changes in hardware or image‐processing settings are a common issue for large multicenter studies. To pool MRI data acquired under these changed conditions, it is necessary to demonstrate that the changes do not affect MRI‐based measurements. In these circumstances, classical inference testing is inappropriate because it is designed to detect differences, not prove similarity. We used a method known as statistical equivalence testing to address this limitation. Equivalence testing was carried out on 3 datasets: (1) cortical thickness and automated hippocampal volume estimates obtained from healthy individuals imaged using different multichannel head coils; (2) manual hippocampal volumetry obtained using two readers; and (3) corpus callosum area estimates obtained using an automated method with manual cleanup carried out by two readers. Equivalence testing was carried out using the “two one‐sided tests” (TOST) approach. Power analyses of the TOST were used to estimate sample sizes required for well‐powered equivalence testing analyses. Mean and standard deviation estimates from the automated hippocampal volume dataset were used to carry out an example power analysis. Cortical thickness values were found to be equivalent over 61% of the cortex when different head coils were used (q < .05, false discovery rate correction). Automated hippocampal volume estimates obtained using the same two coils were statistically equivalent (TOST P = 4.28 × 10−15). Manual hippocampal volume estimates obtained using two readers were not statistically equivalent (TOST P = .97). The use of different readers to carry out limited correction of automated corpus callosum segmentations yielded equivalent area estimates (TOST P = 1.28 × 10−14). Power analysis of simulated and automated hippocampal volume data demonstrated that the equivalence margin affects the number of subjects required for well‐powered equivalence tests. We have presented a statistical method for determining if morphometric measures obtained under variable conditions can be pooled. The equivalence testing technique is applicable for analyses in which experimental conditions vary over the course of the study.

[1]  J. Bremner,et al.  MR-based in vivo hippocampal volumetrics: 1. Review of methodologies currently employed , 2005, Molecular Psychiatry.

[2]  C. Jack,et al.  MRI of hippocampal volume loss in early Alzheimer's disease in relation to ApoE genotype and biomarkers , 2008, Brain : a journal of neurology.

[3]  Scott Peltier,et al.  Abnormalities of intrinsic functional connectivity in autism spectrum disorders, , 2009, NeuroImage.

[4]  Wendy Bogers,et al.  Automated subcortical segmentation using FIRST: Test–retest reliability, interscanner reliability, and comparison to manual segmentation , 2013, Human brain mapping.

[5]  Bruce Fischl,et al.  Within-subject template estimation for unbiased longitudinal image analysis , 2012, NeuroImage.

[6]  A M Dale,et al.  Measuring the thickness of the human cerebral cortex from magnetic resonance images. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Daniel P. Kennedy,et al.  The Autism Brain Imaging Data Exchange: Towards Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism , 2013, Molecular Psychiatry.

[8]  David F. Tate,et al.  Reliability and validity of MRI-based automated volumetry software relative to auto-assisted manual measurement of subcortical structures in HIV-infected patients from a multisite study , 2010, NeuroImage.

[9]  A. Nowacki,et al.  Understanding Equivalence and Noninferiority Testing , 2011, Journal of General Internal Medicine.

[10]  Anders M. Dale,et al.  Reliability of MRI-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer , 2006, NeuroImage.

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  Giovanni B. Frisoni,et al.  Brain morphometry reproducibility in multi-center 3T MRI studies: A comparison of cross-sectional and longitudinal segmentations , 2013, NeuroImage.

[13]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[14]  Donald J. Schuirmann A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability , 1987, Journal of Pharmacokinetics and Biopharmaceutics.

[15]  Anders M. Dale,et al.  MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: Reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths , 2009, NeuroImage.