Analysis validation has been neglected in the Age of Reproducibility

Increasingly complex statistical models are being used for the analysis of biological data. Recent commentary has focused on the ability to compute the same outcome for a given dataset (reproducibility). We argue that a reproducible statistical analysis is not necessarily valid because of unique patterns of nonindependence in every biological dataset. We advocate that analyses should be evaluated with known-truth simulations that capture biological reality, a process we call “analysis validation.” We review the process of validation and suggest criteria that a validation project should meet. We find that different fields of science have historically failed to meet all criteria, and we suggest ways to implement meaningful validation in training and practice.

[1]  Bo Peng,et al.  Genetic Simulation Tools for Post‐Genome Wide Association Studies of Complex Diseases , 2015, Genetic epidemiology.

[2]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[3]  A. Goldberg General System Theory: Foundations, Development, Applications. , 1969 .

[4]  Prasad Patil,et al.  A statistical definition for reproducibility and replicability , 2016, bioRxiv.

[5]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[6]  Kevin R. Thornton,et al.  A Model of Compound Heterozygous, Loss-of-Function Alleles Is Broadly Consistent with Observations from Complex-Disease GWAS Datasets , 2016, bioRxiv.

[7]  Migdalisel Colon-Berlingeri,et al.  Teaching Biology through Statistics: Application of Statistical Methods in Genetics and Zoology Courses , 2011, CBE life sciences education.

[8]  Roger D Peng,et al.  Reproducible research and Biostatistics. , 2009, Biostatistics.

[9]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[10]  M. Fitzpatrick,et al.  Comment on “Genomic signals of selection predict climate-driven population declines in a migratory bird” , 2018, Science.

[11]  Marek Kimmel,et al.  simuPOP: a forward-time population genetics simulation environment , 2005, Bioinform..

[12]  A. Robertson Letters to the editors: Remarks on the Lewontin-Krakauer test. , 1975, Genetics.

[13]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[14]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[15]  M. Whitlock,et al.  Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of FST* , 2015, The American Naturalist.

[16]  M. Blum,et al.  Pcadapt: An R Package to Perform Genome Scans for Selection Based on Principal Component Analysis , 2016, bioRxiv.

[17]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[18]  R. Lewontin,et al.  Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. , 1973, Genetics.

[19]  R. Lewontin,et al.  Testing the Heterogeneity of F Values , 1975 .

[20]  Philip B. Stark,et al.  Before reproducibility must come preproducibility , 2018, Nature.

[21]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[22]  Letters to the editors: Testing the heterogeneity of F values. , 1975, Genetics.

[23]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[24]  D. Posada,et al.  A comparison of tools for the simulation of genomic next-generation sequencing data , 2016, Nature Reviews Genetics.

[25]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[26]  M. Gautier Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates , 2015, Genetics.

[27]  M. Nei,et al.  Lewontin-Krakauer test for neutral genes , 1975 .

[28]  Bo Peng,et al.  Genetic Data Simulators and their Applications: An Overview , 2015, Genetic epidemiology.

[29]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[30]  B. Servin,et al.  Using haplotype differentiation among hierarchically structured populations for the detection of selection signatures , 2012, 1210.7583.

[31]  M. Whitlock,et al.  Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests , 2014, Molecular ecology.

[32]  Patrick D. Evans,et al.  Microcephalin, a Gene Regulating Brain Size, Continues to Evolve Adaptively in Humans , 2005, Science.

[33]  Sam Yeaman,et al.  Comment on "Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens" and "Microcephalin, a Gene Regulating Brain Size, Continues to Evolve Adaptively in Humans" , 2006, Science.

[34]  Patrick D. Evans,et al.  Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens , 2005, Science.

[35]  J. Lopreato,et al.  General system theory : foundations, development, applications , 1970 .

[36]  M. Nei,et al.  Letters to the editors: Lewontin-Krakauer test for neutral genes. , 1975, Genetics.

[37]  Philipp W. Messer,et al.  SLiM: Simulating Evolution with Selection and Linkage , 2013, Genetics.

[38]  R. Harrigan,et al.  Genomic signals of selection predict climate-driven population declines in a migratory bird , 2018, Science.

[39]  Wojciech Jaskowski,et al.  Better GP benchmarks: community survey results and proposals , 2012, Genetic Programming and Evolvable Machines.

[40]  Adam Liwo,et al.  An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12 , 2018, Scientific Reports.

[41]  Jeffrey T. Leek,et al.  Is most published research really false? , 2016, bioRxiv.