Instrumentation bias in the use and evaluation of scientific software: recommendations for reproducible practices in the computational sciences

By honest I don't mean that you only tell what's true. But you make clear the entire situation. You make clear all the information that is required for somebody else who is intelligent to make up their mind. Richard Feynman The neuroscience community significantly benefits from the proliferation of imaging-related analysis software packages. Established packages such as SPM (Ashburner, 2012), the FMRIB Software Library (FSL) (Jenkinson et al., 2012), Freesurfer (Fischl, 2012), Slicer (Fedorov et al., 2012), and the AFNI toolkit (Cox, 2012) aid neuroimaging researchers around the world in performing complex analyses as part of ongoing neuroscience research. In conjunction with distributing robust software tools, neuroimaging packages also continue to incorporate algorithmic innovation for improvement in analysis tools. As fellow scientists who actively participate in neuroscience research through our contributions to the Insight Toolkit1 (e.g., Johnson et al., 2007; Ibanez et al., 2009; Tustison and Avants, 2012) and other packages such as MindBoggle,2 Nipype3 (Gorgolewski et al., 2011), and the Advanced Normalization Tools (ANTs),4 (Avants et al., 2010, 2011) we notice an increasing number of publications that intend a fair comparison of algorithms which, in principle, is a good thing. Our concern is the lack of detail with which these comparisons are often presented and the corresponding possibility of instrumentation bias (Sackett, 1979) where “defects in the calibration or maintenance of measurement instruments may lead to systematic deviations from true values” (considering software as a type of instrument requiring proper “calibration” and “maintenance” for accurate measurements). Based on our experience (including our own mistakes), we propose a preliminary set of guidelines that seek to minimize such bias with the understanding that the discussion will require a more comprehensive response from the larger neuroscience community. Our intent is to raise awareness in both authors and reviewers to issues that arise when comparing quantitative algorithms. Although herein we focus largely on image registration, these recommendations are relevant for other application areas in biologically-focused computational image analysis, and for reproducible computational science in general. This commentary complements recent papers that highlight statistical bias (Kriegeskorte et al., 2009; Vul and Pashler, 2012), bias induced by registration metrics (Tustison et al., 2012), and registration strategy (Yushkevich et al., 2010) and guideline papers for software development (Prlic and Procter, 2012).

[1]  Brian B. Avants,et al.  N4ITK: Improved N3 Bias Correction , 2010, IEEE Transactions on Medical Imaging.

[2]  July , 1890, The Hospital.

[3]  Robert W. Cox,et al.  AFNI: What a long strange trip it's been , 2012, NeuroImage.

[4]  Woo Suk Hwang,et al.  Who is accountable? , 2007, Nature.

[5]  Andreas Prlic,et al.  Ten Simple Rules for the Open Development of Scientific Software , 2012, PLoS Comput. Biol..

[6]  Torsten Rohlfing,et al.  Image Similarity and Tissue Overlaps as Surrogates for Image Registration Accuracy: Widely Used but Unreliable , 2012, IEEE Transactions on Medical Imaging.

[7]  Guojie Li,et al.  From the Editor-in-Chief , 1995, Journal of Computer Science and Technology.

[8]  Ed Vul,et al.  Voodoo and circularity errors , 2012, NeuroImage.

[9]  Arno Klein,et al.  A reproducible evaluation of ANTs similarity metric performance in brain image registration , 2011, NeuroImage.

[10]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[11]  Satrajit S. Ghosh,et al.  Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python , 2011, Front. Neuroinform..

[12]  Bruce Fischl,et al.  FreeSurfer , 2012, NeuroImage.

[13]  Sébastien Ourselin,et al.  A comparison of voxel and surface based cortical thickness estimation methods , 2011, NeuroImage.

[14]  Arthur W. Toga,et al.  Online resource for validation of brain segmentation methods , 2009, NeuroImage.

[15]  D. Louis Collins,et al.  BEaST: Brain extraction based on nonlocal segmentation technique , 2012, NeuroImage.

[16]  Arno Klein,et al.  Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration , 2009, NeuroImage.

[17]  Brian B. Avants,et al.  The optimal template effect in hippocampus studies of diseased populations , 2010, NeuroImage.

[18]  Max A. Viergever,et al.  elastix: A Toolbox for Intensity-Based Medical Image Registration , 2010, IEEE Transactions on Medical Imaging.

[19]  J. Gee,et al.  Logical circularity in voxel‐based analysis: Normalization strategy may induce statistical bias , 2014, Human brain mapping.

[20]  Milan Sonka,et al.  3D Slicer as an image computing platform for the Quantitative Imaging Network. , 2012, Magnetic resonance imaging.

[21]  Ron Mengelers,et al.  The Effects of FreeSurfer Version, Workstation Type, and Macintosh Operating System Version on Anatomical Volume and Cortical Thickness Measurements , 2012, PloS one.

[22]  John Ashburner,et al.  SPM: A history , 2012, NeuroImage.

[23]  Dinggang Shen,et al.  S‐HAMMER: Hierarchical attribute‐guided, symmetric diffeomorphic registration for MR brain images , 2014, Human brain mapping.

[24]  W. K. Simmons,et al.  Circular analysis in systems neuroscience: the dangers of double dipping , 2009, Nature Neuroscience.

[25]  Dinggang Shen,et al.  Hierarchical Attribute-Guided Symmetric Diffeomorphic Registration for MR Brain Images , 2012, MICCAI.

[26]  Brian B. Avants,et al.  Bias in estimation of hippocampal atrophy using deformation-based morphometry arises from asymmetric global normalization: An illustration in ADNI 3 T MRI data , 2010, NeuroImage.

[27]  D. Louis Collins,et al.  Automated segmentation of basal ganglia and deep brain structures in MRI of Parkinson’s disease , 2012, International Journal of Computer Assisted Radiology and Surgery.

[28]  D. Sackett Bias in analytic research. , 1979, Journal of chronic diseases.