Decision-Making in Research Tasks with Sequential Testing

Background In a recent controversial essay, published by JPA Ioannidis in PLoS Medicine, it has been argued that in some research fields, most of the published findings are false. Based on theoretical reasoning it can be shown that small effect sizes, error-prone tests, low priors of the tested hypotheses and biases in the evaluation and publication of research findings increase the fraction of false positives. These findings raise concerns about the reliability of research. However, they are based on a very simple scenario of scientific research, where single tests are used to evaluate independent hypotheses. Methodology/Principal Findings In this study, we present computer simulations and experimental approaches for analyzing more realistic scenarios. In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results. We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice. Results from computer simulations indicate that for the tasks analyzed in this study, the fraction of false among the positive findings declines over several rounds of testing if the most informative tests are performed. Our experiments show that human subjects frequently perform the most informative tests, leading to a decline of false positives as expected from the simulations. Conclusions/Significance For the research tasks studied here, findings tend to become more reliable over time. We also find that the performance in those experimental settings where not all performed tests could be published turned out to be surprisingly inefficient. Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.

[1]  H. Campbell,et al.  Commentary: rare alleles, modest genetic effects and the need for collaboration. , 2007, International journal of epidemiology.

[2]  Louisa M. Slowiaczek,et al.  Information selection and use in hypothesis testing: What is a good question, and what is a good answer? , 1992, Memory & Cognition.

[3]  R. Hoffmann A wiki for the life sciences where authorship matters , 2008, Nature Genetics.

[4]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[5]  Ryan D. Csada,et al.  The "File Drawer Problem" of Non-Significant Results: Does It Apply to Biological Research? , 1996 .

[6]  Andrey Rzhetsky,et al.  Microparadigms: chains of collective reasoning in publications about molecular interactions. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Ioannidis Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.

[8]  Thomas A Trikalinos,et al.  Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials. , 2005, Journal of clinical epidemiology.

[9]  Peter Urbach,et al.  Scientific Reasoning: The Bayesian Approach , 1989 .

[10]  Robert Hoffmann,et al.  Temporal patterns of genes in scientific publications , 2007, Proceedings of the National Academy of Sciences.

[11]  P C Wason,et al.  Reasoning about a Rule , 1968, The Quarterly journal of experimental psychology.

[12]  R. Hanson Could gambling save science? Encouraging an honest consensus , 1995 .

[13]  Jonathan D. Nelson Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain. , 2005, Psychological review.

[14]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[15]  D. Kahneman,et al.  Heuristics and Biases: The Psychology of Intuitive Judgment , 2002 .

[16]  Charles A. Holt,et al.  Information Cascades in the Laboratory , 1998 .

[17]  J. Ioannidis,et al.  Evolution and Translation of Research Findings: From Bench to Where? , 2006, PLoS clinical trials.

[18]  R. Lindsay,et al.  On Estimating the Diagnosticity of Eyewitness Nonidentifications , 1980 .

[19]  J. Ioannidis,et al.  Persistence of contradicted claims in the literature. , 2007, JAMA.

[20]  S. Bikhchandani,et al.  Herd Behavior in Financial Markets , 2000, IMF Staff Papers.

[21]  A. Tversky,et al.  Subjective Probability: A Judgment of Representativeness , 1972 .

[22]  Corinne Zimmerman The development of scientific reasoning skills. , 2000 .

[23]  S. Goodman,et al.  Evidence and scientific research. , 1988, American journal of public health.

[24]  D. Kahneman,et al.  Heuristics and Biases: List of Contributors , 2002 .

[25]  B. Fischhoff,et al.  Assessing uncertainty in physical constants , 1986 .

[26]  M. Nowak,et al.  Digital cows grazing on digital grounds , 2006, Current Biology.

[27]  Sander Greenland,et al.  ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE" , 2007 .

[28]  A. Palmer,et al.  QUASIREPLICATION AND THE CONTRACT OF ERROR: Lessons from Sex Ratios, Heritabilities and Fluctuating Asymmetry , 2000 .

[29]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[30]  C. Howson,et al.  Scientific Reasoning: The Bayesian Approach , 1989 .