Raiders of the lost HARK: a reproducible inference framework for big data science

Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources—from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins ‘natural selection’ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.

[1]  Jeffrey B. Vancouver,et al.  In Defense of HARKing , 2018, Industrial and Organizational Psychology.

[2]  C.H.J. Hartgerink,et al.  Reanalyzing Head et al. (2015): investigating the robustness of widespread p-hacking , 2017, PeerJ.

[3]  U. Dirnagl,et al.  Biomedical research: increasing value, reducing waste , 2014, The Lancet.

[4]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[5]  Paul Faya,et al.  Bayesian assurance and sample size determination in the process validation life-cycle , 2017, Journal of biopharmaceutical statistics.

[6]  Craig,et al.  Corrigendum: Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results , 2018, Advances in Methods and Practices in Psychological Science.

[7]  C. Chambers Registered Reports: A new publishing initiative at Cortex , 2013, Cortex.

[8]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[9]  D. Moher,et al.  CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials , 2010, Journal of clinical epidemiology.

[10]  Gerd Gigerenzer We need statistical thinking, not statistical rituals , 1998, Behavioral and Brain Sciences.

[11]  Tim Chartier Vertigo Over the Seven V's of Big Data , 2016 .

[12]  Ziad Salameh,et al.  P-Hacking: A Wake-Up Call for the Scientific Community , 2018, Sci. Eng. Ethics.

[13]  N. Eriksson,et al.  Replicability and Robustness of Genome-Wide-Association Studies for Behavioral Traits , 2014, Psychological science.

[14]  Anne-Mieke Vandamme,et al.  The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing , 2009 .

[15]  Anne-Mieke Vandamme,et al.  The Phylogenetic Handbook: Index , 2009 .

[16]  Douglas G. Altman,et al.  Updating standards for reporting diagnostic accuracy: the development of STARD 2015 , 2016, Research Integrity and Peer Review.

[17]  P. Bennett,et al.  Meta-analysis reveals association between most common class II haplotype in full-heritage Native Americans and rheumatoid arthritis. , 1995, Human immunology.

[18]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[19]  Joshua Carp,et al.  The secret lives of experiments: Methods reporting in the fMRI literature , 2012, NeuroImage.

[20]  Arcadi Navarro,et al.  Replicability and Prediction: Lessons and Challenges from GWAS. , 2018, Trends in genetics : TIG.

[21]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[22]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[23]  Dana P Turner,et al.  P‐Hacking in Headache Research , 2018, Headache.

[24]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[25]  J. Ioannidis,et al.  Reproducibility in Science: Improving the Standard for Basic and Preclinical Research , 2015, Circulation research.

[26]  Breanne Chryst,et al.  No Need for Bayes Factors: A Fully Bayesian Evidence Synthesis , 2017, Front. Appl. Math. Stat..

[27]  J. Wicherts The Weak Spots in Contemporary Science (and How to Fix Them) , 2017, Animals : an open access journal from MDPI.

[28]  Carl T. Bergstrom,et al.  Publication bias and the canonization of false facts , 2016, eLife.

[29]  Martin Wilkie Strengthening the Reporting of Cohort Studies in Peritoneal Dialysis , 2017, Peritoneal Dialysis International.

[30]  George Patounakis,et al.  Clinical trial registry alone is not adequate: on the perception of possible endpoint switching and P-hacking. , 2018, Human reproduction.

[31]  John Carson Allen,et al.  P-Hacking in Orthopaedic Literature: A Twist to the Tail. , 2016, The Journal of bone and joint surgery. American volume.

[32]  Paul A Murtaugh,et al.  In defense of P values. , 2014, Ecology.

[33]  Scott C. Schmidler,et al.  α-Stable Limit Laws for Harmonic Mean Estimators of Marginal Likelihoods , 2012 .

[34]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[35]  J G Thornton,et al.  Inadvertent P‐hacking among trials and systematic reviews of the effect of progestogens in pregnancy? A systematic review and meta‐analysis , 2017, BJOG : an international journal of obstetrics and gynaecology.

[36]  Michèle B. Nuijten,et al.  Distributions of p-values smaller than .05 in psychology: what is going on? , 2016, PeerJ.

[37]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[38]  A. Gelman,et al.  The garden of forking paths : Why multiple comparisons can be a problem , even when there is no “ fishing expedition ” or “ p-hacking ” and the research hypothesis was posited ahead of time ∗ , 2019 .

[39]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[40]  David Colquhoun,et al.  The reproducibility of research and the misinterpretation of P values , 2017, bioRxiv.

[41]  H. Browman,et al.  Welfare of aquatic organisms: is there some faith-based HARKing going on here? , 2011, Diseases of aquatic organisms.

[42]  Peter Kraft,et al.  Replication in genome-wide association studies. , 2009, Statistical science : a review journal of the Institute of Mathematical Statistics.

[43]  Angelo Nuzzo,et al.  An automated reasoning framework for translational research , 2010, J. Biomed. Informatics.

[44]  Brad Verhulst In Defense of P Values. , 2016, AANA journal.

[45]  Igor Douven,et al.  Probabilistic alternatives to Bayesianism: the case of explanationism , 2015, Front. Psychol..

[46]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[47]  Andrew Gelman,et al.  Why We (Usually) Don't Have to Worry About Multiple Comparisons , 2009, 0907.2478.

[48]  D. Mehler,et al.  Open science challenges, benefits and tips in early career and beyond , 2018, PLoS biology.

[49]  N. Kerr HARKing: Hypothesizing After the Results are Known , 1998, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[50]  Herman Aguinis,et al.  HARKing's Threat to Organizational Research: Evidence From Primary and Meta‐Analytic Sources , 2016 .

[51]  John P A Ioannidis Handling the fragile vase of scientific practices. , 2015, Addiction.

[52]  Han L. J. van der Maas,et al.  Science Perspectives on Psychological an Agenda for Purely Confirmatory Research on Behalf Of: Association for Psychological Science , 2022 .

[53]  Philip Sedgwick What is publication bias in a meta-analysis? , 2015, BMJ : British Medical Journal.

[54]  Taha Yasseri,et al.  A Biased Review of Biases in Twitter Studies on Political Collective Action , 2016, Front. Phys..

[55]  Jan P Vandenbroucke,et al.  The making of STROBE. , 2007, Epidemiology.

[56]  Edward I. George,et al.  The Practical Implementation of Bayesian Model Selection , 2001 .

[57]  Alison Ledgerwood,et al.  The preregistration revolution needs to distinguish between predictions and analyses , 2018, Proceedings of the National Academy of Sciences.

[58]  R. Lanfear,et al.  The Extent and Consequences of P-Hacking in Science , 2015, PLoS biology.

[59]  Uri Simonsohn,et al.  Posterior-Hacking: Selective Reporting Invalidates Bayesian Results Also , 2014 .

[60]  J. Ioannidis,et al.  STrengthening the REporting of Genetic Association Studies (STREGA)— An Extension of the STROBE Statement , 2009, PLoS medicine.

[61]  John R. Platt,et al.  Number 3642 Strong Inference Certain systematic methods of scientific thinking , 2012 .

[62]  René Veenstra,et al.  I Just Ran a Thousand Analyses: Benefits of Multiple Testing in Understanding Equivocal Evidence on Gene-Environment Interactions , 2015, PloS one.

[63]  C. Arango,et al.  Candidate gene associations studies in psychiatry: time to move forward , 2017, European Archives of Psychiatry and Clinical Neuroscience.

[64]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[65]  N. Lazar,et al.  Moving to a World Beyond “p < 0.05” , 2019, The American Statistician.

[66]  Jan P Vandenbroucke Preregistration: when shall we start the real discussion? , 2015, European journal of public health.

[67]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[68]  J S Koopman,et al.  Epigenesis theory: a mathematical model relating causal concepts of pathogenesis in individuals to disease patterns in populations. , 1990, American journal of epidemiology.

[69]  Jan P. Vandenbroucke,et al.  On Compulsory Preregistration of Protocols , 2012 .

[70]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[71]  Andrew Gelman,et al.  Multilevel (Hierarchical) Modeling: What It Can and Cannot Do , 2006, Technometrics.

[72]  George Amato,et al.  A phylogenetic hypothesis for Crocodylus (Crocodylia) based on mitochondrial DNA: evidence for a trans-Atlantic voyage from Africa to the New World. , 2011, Molecular phylogenetics and evolution.

[73]  Lifeng Lin,et al.  Quantifying publication bias in meta‐analysis , 2018, Biometrics.

[74]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[75]  Timothy L Lash,et al.  Commentary: Should Preregistration of Epidemiologic Study Protocols Become Compulsory? Reflections and a Counterproposal , 2012, Epidemiology.

[76]  Mark Rubin,et al.  When Does HARKing Hurt? Identifying When Different Types of Undisclosed Post Hoc Hypothesizing Harm Scientific Progress , 2017 .

[77]  John P. A. Ioannidis,et al.  p-Curve and p-Hacking in Observational Research , 2016, PloS one.

[78]  John R. Hollenbeck,et al.  Harking, Sharking, and Tharking , 2017 .

[79]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[80]  Igor Douven,et al.  Inference to the Best Explanation, Dutch Books, and Inaccuracy Minimisation , 2013 .

[81]  J. Platt Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. , 1964, Science.

[82]  Hans Ekkehard Plesser,et al.  Reproducibility vs. Replicability: A Brief History of a Confused Terminology , 2018, Front. Neuroinform..

[83]  M. Ramoni,et al.  An epistemological framework for medical knowledge-based systems , 1992, IEEE Trans. Syst. Man Cybern..

[84]  Ruth Heller,et al.  Repfdr: a Tool for Replicability Analysis for Genome-wide Association Studies , 2014, Bioinform..

[85]  Thomas Boraud,et al.  Replication Validity of Initial Association Studies: A Comparison between Psychiatry, Neurology and Four Somatic Diseases , 2016, PloS one.

[86]  J. Ioannidis The Proposal to Lower P Value Thresholds to .005. , 2018, JAMA.

[87]  R. Stine Model Selection Using Information Theory and the MDL Principle , 2004 .

[88]  Daniël Lakens,et al.  What p-hacking really looks like , 2014 .

[89]  F. Korner‐Nievergelt,et al.  The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research , 2017, PeerJ.

[90]  Jelte M. Wicherts,et al.  Conducting Meta-Analyses Based on p Values , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[91]  Hayden C. Metsky,et al.  Genomic epidemiology reveals multiple introductions of Zika virus into the United States , 2017, Nature.

[92]  Rolf Ulrich,et al.  p-hacking by post hoc selection with multiple opportunities: Detectability by skewness test?: Comment on Simonsohn, Nelson, and Simmons (2014). , 2015, Journal of experimental psychology. General.

[93]  Joseph J. Mazzola,et al.  Forgetting What We Learned as Graduate Students: HARKing and Selective Outcome Reporting in I–O Journal Articles , 2013, Industrial and Organizational Psychology.