Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (‘analytic reproducibility’). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

[1]  Tyler J VanderWeele,et al.  Recommendations for presenting analyses of effect modification and interaction. , 2012, International journal of epidemiology.

[2]  Ryan P. Womack,et al.  Research Data in Core Journals in Biology, Chemistry, Mathematics, and Physics , 2015, PloS one.

[3]  Nicole A. Vasilevsky,et al.  Reproducible and reusable research: are journal data sharing policies meeting the mark? , 2017, PeerJ.

[4]  Barbara A. Spellman,et al.  A Short (Personal) Future History of Revolution 2.0 , 2015, Perspectives on psychological science : a journal of the Association for Psychological Science.

[5]  J. Wicherts,et al.  The (mis)reporting of statistical results in psychology journals , 2011, Behavior research methods.

[6]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[7]  W. Vanpaemel,et al.  Are We Wasting a Good Crisis? The Availability of Psychological Research Data after the Storm , 2015 .

[8]  H. Pashler,et al.  Editors’ Introduction to the Special Section on Replicability in Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[9]  Brian A. Nosek,et al.  Scientific Utopia , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[10]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[11]  J. Ioannidis,et al.  Public Availability of Published Research Data in High-Impact Journals , 2011, PloS one.

[12]  M. Biernat,et al.  Analytic Review as a Solution to the Misreporting of Statistical Results in Psychological Science , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[13]  Robert Rosenthal,et al.  How often are our numbers wrong , 1978 .

[14]  G. Cumming,et al.  Statistical Reform in Psychology , 2007, Psychological science.

[15]  J. Ioannidis Why Science Is Not Necessarily Self-Correcting , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[16]  Maya B. Mathur,et al.  R Function for Additive Interaction Measures. , 2018, Epidemiology.

[17]  Victoria Stodden,et al.  Reproducing Statistical Results , 2015 .

[18]  John P. A. Ioannidis,et al.  How to Make More Published Research True , 2014, PLoS medicine.

[19]  Teresa D. Harrison,et al.  Lessons from the JMCB Archive , 2006 .

[20]  Michèle B. Nuijten,et al.  Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology , 2017 .

[21]  Susann Fiedler,et al.  Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency , 2016, PLoS biology.

[22]  John P. A. Ioannidis,et al.  Reproducible Research Practices and Transparency across the Biomedical Literature , 2016, PLoS biology.

[23]  Kaustubh Supekar,et al.  Distinct Global Brain Dynamics and Spatiotemporal Organization of the Salience Network , 2016, PLoS biology.

[24]  S. Goodman,et al.  Meta-research: Evaluation and Improvement of Research Methods and Practices , 2015, PLoS biology.

[25]  Anisa Rowhani-Farid,et al.  Has open data arrived at the British Medical Journal (BMJ)? An observational study , 2016, BMJ Open.

[26]  Nicholas Eubank,et al.  Lessons from a Decade of Replications at the Quarterly Journal of Political Science , 2016, PS: Political Science & Politics.

[27]  Leif D. Nelson,et al.  Psychology's Renaissance , 2018, Annual review of psychology.

[28]  Michèle B. Nuijten,et al.  The prevalence of statistical reporting errors in psychology (1985–2013) , 2015, Behavior Research Methods.

[29]  T. VanderWeele On a Square-Root Transformation of the Odds Ratio for a Common Outcome. , 2017, Epidemiology.

[30]  V. Stodden,et al.  Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals , 2013, PloS one.

[31]  Phillip Li,et al.  Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say 'Usually Not' , 2015 .

[32]  S. Sloman Opening editorial: The changing face of Cognition , 2015, Cognition.

[33]  W. D. Scipio Scientist As Subject: The Psychological Imperative , 1978 .

[34]  Ben Marwick,et al.  Packaging Data Analytical Work Reproducibly Using R (and Friends) , 2018 .

[35]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[36]  D. Borsboom,et al.  The poor availability of psychological research data for reanalysis. , 2006, The American psychologist.

[37]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[38]  Michèle B. Nuijten,et al.  Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science , 2014, PloS one.

[39]  Florence Debarre,et al.  The Availability of Research Data Declines Rapidly with Article Age , 2013, Current Biology.

[40]  Ulrich Dirnagl,et al.  Olfactory Ensheathing Cell Transplantation in Experimental Spinal Cord Injury: Effect size and Reporting Bias of 62 Experimental Treatments: A Systematic Review and Meta-Analysis , 2016, PLoS biology.

[41]  S Greenland,et al.  Concepts of interaction. , 1980, American journal of epidemiology.

[42]  J. Zhang,et al.  What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. , 1998, JAMA.

[43]  A. Gasparrini,et al.  Interrupted time series regression for the evaluation of public health interventions: a tutorial , 2016, International journal of epidemiology.

[44]  Elizabeth Gilbert,et al.  Reproducibility Project: Results (Part of symposium called "The Reproducibility Project: Estimating the Reproducibility of Psychological Science") , 2014 .

[45]  Norman Kaplan,et al.  The Sociology of Science: Theoretical and Empirical Investigations , 1974 .

[46]  David Moher,et al.  Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine , 2018, British Medical Journal.

[47]  R. Newcombe Two-sided confidence intervals for the single proportion: comparison of seven methods. , 1998, Statistics in medicine.

[48]  Tom E Hardwicke,et al.  Populating the Data Ark: An attempt to retrieve, preserve, and liberate data from the most highly-cited psychology and psychiatry articles , 2018, PloS one.

[49]  Guide to authors , 1987, Nature.

[50]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[51]  April Clyburne-Sherin,et al.  Computational Reproducibility via Containers in Social Psychology , 2018 .

[52]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[53]  Yihui Xie,et al.  A General-Purpose Package for Dynamic Report Generation in R , 2016 .

[54]  Michael C. Frank,et al.  A practical guide for transparency in psychological science , 2018 .

[55]  Joseph Glaz,et al.  Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions , 1995 .

[56]  J. Ioannidis,et al.  Unavailability of online supplementary scientific information from articles published in major journals , 2005, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[57]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.