Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics

Reproducibility, the ability to replicate scientific findings, is a prerequisite for scientific discovery and clinical utility. Troublingly, we are in the midst of a reproducibility crisis. A key to reproducibility is that multiple measurements of the same item (e.g., experimental sample or clinical participant) under fixed experimental constraints are relatively similar to one another. We demonstrate that existing reproducibility statistics, such as intra-class correlation coefficient and fingerprinting, are not valid measures of reproducibility, in that they can provide unreasonably low or high results, even without model misspecification. We therefore propose a novel statistic, discriminability, which quantifies the degree to which an individual’s samples are relatively similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. Using this statistic, we introduce the possibility of optimizing experimental design via increasing discriminability and prove that optimizing discriminability improves performance bounds in subsequent inference tasks. In extensive simulated and real datasets (focusing on brain imaging and demonstrating on genomics), only optimizing data discriminability improves performance on all subsequent inference tasks for each dataset. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the reproducibility crisis, and more generally, mitigating accidental measurement error.

[1]  R. Paley,et al.  On some series of functions, (3) , 1930, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  R. A. Fisher,et al.  Design of Experiments , 1936 .

[3]  J. Murray,et al.  HANDBOOK OF PSYCHOLOGY , 1951 .

[4]  L. Cronbach,et al.  THEORY OF GENERALIZABILITY: A LIBERALIZATION OF RELIABILITY THEORY† , 1963 .

[5]  D. R. Heise Separating reliability and stability in test-retest correlation. , 1969 .

[6]  Stability , 1973 .

[7]  J. Bartko,et al.  On Various Intraclass Correlation Reliability Coefficients , 1976 .

[8]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[9]  Edward G. Carmines,et al.  Reliability and Validity Assessment , 1979 .

[10]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[11]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[12]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[13]  Jack L. Lancaster,et al.  The Talairach Daemon a database server for talairach atlas labels , 1997 .

[14]  A M Dale,et al.  Optimal experimental design for event‐related fMRI , 1999, Human brain mapping.

[15]  D. Louis Collins,et al.  Application of Information Technology: A Four-Dimensional Probabilistic Atlas of the Human Brain , 2001, J. Am. Medical Informatics Assoc..

[16]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[17]  Lars Kai Hansen,et al.  The Quantitative Evaluation of Functional Neuroimaging Experiments: The NPAIRS Data Analysis Framework , 2000, NeuroImage.

[18]  S. C. Strother,et al.  The Quantitative Evaluation of Functional Neuroimaging Experiments: Mutual Information Learning Curves , 2002, NeuroImage.

[19]  Mark W. Woolrich,et al.  Advances in functional and structural MR image analysis and implementation as FSL , 2004, NeuroImage.

[20]  Lars Kai Hansen,et al.  Detection of skin cancer by classification of Raman spectra , 2004, IEEE Transactions on Biomedical Engineering.

[21]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[22]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[23]  Liam Paninski,et al.  Asymptotic Theory of Information-Theoretic Experimental Design , 2005, Neural Computation.

[24]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[25]  S. Wakana,et al.  MRI Atlas of Human White Matter , 2005 .

[26]  P. Szeszko,et al.  MRI atlas of human white matter , 2006 .

[27]  Anders M. Dale,et al.  An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest , 2006, NeuroImage.

[28]  Edward H. Haertel,et al.  4 Reliability Coefficients and Generalizability Theory , 2006 .

[29]  Carl J. Huberty,et al.  Applied MANOVA and discriminant analysis , 2006 .

[30]  N. Makris,et al.  Decreased volume of left and total anterior insular lobule in schizophrenia , 2006, Schizophrenia Research.

[31]  W. Wien,et al.  Object-oriented Computation of Sandwich Estimators , 2006 .

[32]  Mark W. Woolrich,et al.  Bayesian analysis of neuroimaging data in FSL , 2009, NeuroImage.

[33]  Xi-Nian Zuo,et al.  Reliable intrinsic connectivity networks: Test–retest evaluation using ICA and dual regression approach , 2010, NeuroImage.

[34]  Maria L. Rizzo,et al.  DISCO analysis: A nonparametric extension of analysis of variance , 2010, 1011.2288.

[35]  R Landewé,et al.  Does the intraclass correlation coefficient always reliably express reliability? Comment on the article by Cheung et al , 2010, Arthritis care & research.

[36]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[37]  Christian Windischberger,et al.  Toward discovery science of human brain function , 2010, Proceedings of the National Academy of Sciences.

[38]  Xinwei Deng,et al.  Experimental design , 2012, WIREs Data Mining Knowl. Discov..

[39]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[40]  Mark W. Woolrich,et al.  FSL , 2012, NeuroImage.

[41]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[42]  Guy B. Williams,et al.  QuickBundles, a Method for Tractography Simplification , 2012, Front. Neurosci..

[43]  Rex E. Jung,et al.  Computing scalable multivariate glocal invariants of large (brain-) graphs , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[44]  M. B. Nebel,et al.  Quantifying the reliability of image replication studies: The image intraclass correlation coefficient (I2C2) , 2013, Cognitive, affective & behavioral neuroscience.

[45]  Li Qingyang,et al.  Towards Automated Analysis of Connectomes: The Configurable Pipeline for the Analysis of Connectomes (C-PAC) , 2013 .

[46]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[47]  Torbjörn Falkmer,et al.  The Case for Using the Repeatability Coefficient When Calculating Test–Retest Reliability , 2013, PloS one.

[48]  Keith Heberlein,et al.  Imaging human connectomes at the macroscale , 2013, Nature Methods.

[49]  Bing Chen,et al.  An open science resource for establishing reliability and reproducibility in functional connectomics , 2014, Scientific Data.

[50]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[51]  Ulrike von Luxburg,et al.  Local Ordinal Embedding , 2014, ICML.

[52]  C. Sripada,et al.  Lag in maturation of the brain’s intrinsic functional architecture in attention-deficit/hyperactivity disorder , 2014, Proceedings of the National Academy of Sciences.

[53]  Maxime Descoteaux,et al.  Dipy, a library for the analysis of diffusion MRI data , 2014, Front. Neuroinform..

[54]  C. Sripada,et al.  Modality-Spanning Deficits in Attention-Deficit/Hyperactivity Disorder in Functional Networks, Gray Matter, and White Matter , 2014, The Journal of Neuroscience.

[55]  Jeffrey T. Leek,et al.  Statistics: P values are just the tip of the iceberg , 2015, Nature.

[56]  Stephen C. Strother,et al.  Correction: An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI , 2015, PloS one.

[57]  Michael J. Marks,et al.  Editorial , 2015 .

[58]  M. Chun,et al.  Functional connectome fingerprinting: Identifying individuals based on patterns of brain connectivity , 2015, Nature Neuroscience.

[59]  Monya Baker,et al.  Over half of psychology studies fail reproducibility test , 2015, Nature.

[60]  Maria L. Rizzo,et al.  Energy distance , 2016 .

[61]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[62]  J. Leek,et al.  What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[63]  Terry K Koo,et al.  A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. , 2016, Journal Chiropractic Medicine.

[64]  David J. Hand,et al.  Measurement: A Very Short Introduction , 2016 .

[65]  Carey E. Priebe,et al.  From Distance Correlation to Multiscale Generalized Correlation , 2017 .

[66]  Dustin Scheinost,et al.  Can brain state be manipulated to emphasize individual differences in functional connectivity? , 2017, NeuroImage.

[67]  Satrajit S. Ghosh,et al.  The Healthy Brain Network Serial Scanning Initiative: a resource for evaluating inter-individual differences and their reliabilities across scan conditions and sessions , 2016, bioRxiv.

[68]  Gaurav Pandey,et al.  Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity. , 2017, Cell stem cell.

[69]  Dustin Scheinost,et al.  Influences on the Test–Retest Reliability of Functional Connectivity MRI and its Relationship with Behavioral Utility , 2017, Cerebral cortex.

[70]  Kevin Murphy,et al.  Towards a consensus regarding global signal regression for resting state functional connectivity MRI , 2017, NeuroImage.

[71]  Thomas T. Liu,et al.  The global signal in fMRI: Nuisance or Information? , 2017, NeuroImage.

[72]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[73]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[74]  J. Vogelstein,et al.  Decision Forests Induce Characteristic Kernels , 2018, ArXiv.

[75]  Cencheng Shen,et al.  The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing , 2018, ArXiv.

[76]  Martin A. Lindquist,et al.  On statistical tests of functional connectome fingerprinting , 2018, bioRxiv.

[77]  Paul J. Barr,et al.  Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales , 2018, BMC Medical Research Methodology.

[78]  Ting Xu,et al.  Bagging improves reproducibility of functional parcellation of the human brain , 2018, NeuroImage.

[79]  Carey E. Priebe,et al.  FlashR: parallelize and scale R for machine learning using SSDs , 2018, PPOPP.

[80]  Gary Koch,et al.  Performance of intraclass correlation coefficient (ICC) as a reliability index under various distributions in scale reliability studies , 2018, Statistics in medicine.

[81]  Gang Chen,et al.  Intraclass correlation: improved modeling approaches and applications for neuroimaging , 2017, bioRxiv.

[82]  Vince D. Calhoun,et al.  A High-Throughput Pipeline Identifies Robust Connectomes But Troublesome Variability , 2017, bioRxiv.

[83]  Maria L. Rizzo,et al.  E-Statistics: Multivariate Inference via the Energy of Data [R package energy version 1.7-7] , 2019 .

[84]  William H. Woodall,et al.  Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban , 2019, The American Statistician.

[85]  Joshua T. Vogelstein,et al.  Standardizing human brain parcellations , 2019, Scientific Data.

[86]  N. Lazar,et al.  Moving to a World Beyond “p < 0.05” , 2019, The American Statistician.

[87]  Cencheng Shen,et al.  mgcpy: A Comprehensive High Dimensional Independence Testing Python Package , 2019, ArXiv.

[88]  Xi-Nian Zuo,et al.  Harnessing reliability for neuroscience research , 2019, Nature Human Behaviour.

[89]  Erkan Ozge Buzbas,et al.  Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity , 2018, PloS one.

[90]  Eric W. Bridgeford,et al.  Discovering and deciphering relationships across disparate data modalities , 2016, eLife.

[91]  C. Priebe,et al.  The Exact Equivalence of Independence Testing and Two-Sample Testing , 2019, ArXiv.

[92]  Joshua T. Vogelstein,et al.  P-Values in a Post-Truth World , 2020, 2007.03611.

[93]  Assessing aneuploidy with repetitive element sequencing , 2020, Proceedings of the National Academy of Sciences.

[94]  Zeyi Wang,et al.  Statistical Analysis of Data Repeatability Measures , 2020, 2005.11911.

[95]  Eric W. Bridgeford,et al.  Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics , 2021, PLoS Comput. Biol..

[96]  Bennett A. Landman,et al.  On statistical tests of functional connectome fingerprinting , 2021 .