When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment
暂无分享,去创建一个
[1] Howard Bowman,et al. I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification , 2016, bioRxiv.
[2] J. Shaffer. Multiple Hypothesis Testing , 1995 .
[3] R. Leech,et al. Neuroadaptive Bayesian Optimization and Hypothesis Testing , 2017, Trends in Cognitive Sciences.
[4] J. Ioannidis,et al. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature , 2017, PLoS biology.
[5] J. Ioannidis,et al. Outcome reporting bias in clinical trials: why monitoring matters , 2017, British Medical Journal.
[6] Thomas E. Nichols,et al. Best practices in data analysis and sharing in neuroimaging using MRI , 2017, Nature Neuroscience.
[7] Yolanda Gil,et al. Enhancing reproducibility for computational methods , 2016, Science.
[8] Denes Szucs,et al. A Tutorial on Hunting Statistical Significance by Chasing N , 2016, Front. Psychol..
[9] Hans Knutsson,et al. Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.
[10] N. Lazar,et al. The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .
[11] J. Ioannidis,et al. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. , 2016, JAMA.
[12] J. Vandekerckhove,et al. A Bayesian Perspective on the Reproducibility Project: Psychology , 2016, PloS one.
[13] John P. A. Ioannidis,et al. p-Curve and p-Hacking in Observational Research , 2016, PloS one.
[14] J. Ioannidis,et al. Registration practices for observational studies on ClinicalTrials.gov indicated low adherence. , 2016, Journal of clinical epidemiology.
[15] James O. Berger,et al. Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses , 2015, Journal of mathematical psychology.
[16] Michèle B. Nuijten,et al. The prevalence of statistical reporting errors in psychology (1985–2013) , 2015, Behavior Research Methods.
[17] Jeffrey N. Rouder,et al. The fallacy of placing confidence in confidence intervals , 2015, Psychonomic bulletin & review.
[18] Mandy Eberhart. Teaching Students To Read , 2016 .
[19] Doreen Eichel,et al. Data Analysis A Bayesian Tutorial , 2016 .
[20] Isabelle Boutron,et al. Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention , 2015, BMC Medical Research Methodology.
[21] John P A Ioannidis,et al. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. , 2015, Journal of clinical epidemiology.
[22] Michael C. Frank,et al. Estimating the reproducibility of psychological science , 2015, Science.
[23] R. Kaplan,et al. Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time , 2015, PloS one.
[24] Brian A. Nosek,et al. Promoting an open research culture , 2015, Science.
[25] Jean-Baptiste Poline,et al. Improving functional magnetic resonance imaging reproducibility , 2015, GigaScience.
[26] A. Gelman. The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research , 2015 .
[27] Carol Jagger,et al. Assessing the validity of the Global Activity Limitation Indicator in fourteen European countries , 2015, BMC Medical Research Methodology.
[28] Anders Engberg-Pedersen. Empire of chance , 2015 .
[29] Michèle B. Nuijten,et al. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science , 2014, PloS one.
[30] John P. A. Ioannidis,et al. Big data meets public health , 2014, Science.
[31] Sally Hopewell,et al. Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the SPIIN randomized controlled trial. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.
[32] John P. A. Ioannidis,et al. How to Make More Published Research True , 2014, PLoS medicine.
[33] Mika Kivimäki,et al. Don't Let the Truth Get in the Way of a Good Story: An Illustration of Citation Bias in Epidemiologic Research , 2014, American journal of epidemiology.
[34] John P A Ioannidis,et al. Placing epidemiological results in the context of multiplicity and typical correlations of exposures , 2014, Journal of Epidemiology & Community Health.
[35] John P A Ioannidis,et al. Studying the elusive environment in large scale. , 2014, JAMA.
[36] Jelle J. Goeman,et al. Multiple hypothesis testing in genomics , 2014, Statistics in medicine.
[37] D. Lakens,et al. Sailing From the Seas of Chaos Into the Corridor of Stability , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.
[38] Leif D. Nelson,et al. P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results , 2014 .
[39] F. De Filippis,et al. A Selected Core Microbiome Drives the Early Stages of Three Popular Italian Cheese Manufactures , 2014, PloS one.
[40] John P A Ioannidis,et al. Improving the drug development process: more not less randomized trials. , 2014, JAMA.
[41] R. Tibshirani,et al. Increasing value and reducing waste in research design, conduct, and analysis , 2014, The Lancet.
[42] Jeffrey N. Rouder,et al. Robust misinterpretation of confidence intervals , 2013, Psychonomic bulletin & review.
[43] Leif D. Nelson,et al. P-Curve: A Key to the File Drawer , 2013, Journal of experimental psychology. General.
[44] A. Gelman,et al. The statistical crisis in science , 2014 .
[45] Andrew Gelman,et al. Data-dependent analysis—a "garden of forking paths"— explains why many statistically significant comparisons don't hold up. , 2014 .
[46] I. Kawachi,et al. Don ' t Let the Truth Get in the Way of a Good Story : An , 2014 .
[47] Published Online. Biomedical research: increasing value, reducing waste , 2014 .
[48] S. Goodman,et al. Raw data from clinical trials: within reach? , 2013, Trends in pharmacological sciences.
[49] Andrew Gelman,et al. Interrogating p-values , 2013 .
[50] J. Ioannidis,et al. Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.
[51] Brian A. Nosek,et al. Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.
[52] T. Perneger,et al. Citation bias favoring statistically significant studies was present in medical research. , 2013, Journal of clinical epidemiology.
[53] John P A Ioannidis,et al. Is everything we eat associated with cancer? A systematic cookbook review. , 2013, The American journal of clinical nutrition.
[54] G. Cumming. The New Statistics: Why and How , 2013 .
[55] Andrew Gelman,et al. P values and statistical practice. , 2013, Epidemiology.
[56] J. Ioannidis. Why Science Is Not Necessarily Self-Correcting , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.
[57] H. Pashler,et al. Is the Replicability Crisis Overblown? Three Arguments Examined , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.
[58] Joshua Carp,et al. The secret lives of experiments: Methods reporting in the fMRI literature , 2012, NeuroImage.
[59] Hans Knutsson,et al. Does Parametric Fmri Analysis with Spm Yield Valid Results? -an Empirical Study of 1484 Rest Datasets Does Parametric Fmri Analysis with Spm Yield Valid Results? - an Empirical Study of 1484 Rest Datasets , 2022 .
[60] Jeffrey R. Spies,et al. Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability , 2012, 1205.4251.
[61] G. Loewenstein,et al. Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.
[62] C. Begley,et al. Drug development: Raise standards for preclinical cancer research , 2012, Nature.
[63] John P A Ioannidis,et al. What Should the Genome-wide Significance Threshold Be? Empirical Replication of Borderline Genetic Associations Yfor a Full List of Investigators Offering Data and Clarifications See Acknowledgments , 2022 .
[64] C. Glenn Begley,et al. Raise standards for preclinical cancer research , 2012 .
[65] R. Peng. Reproducible Research in Computational Science , 2011, Science.
[66] Leif D. Nelson,et al. False-Positive Psychology , 2011, Psychological science.
[67] J. Ioannidis,et al. Risk factors and interventions with statistically significant tiny effects. , 2011, International journal of epidemiology.
[68] J. Ioannidis,et al. The False-positive to False-negative Ratio in Epidemiologic Studies , 2011, Epidemiology.
[69] J. Wicherts,et al. The (mis)reporting of statistical results in psychology journals , 2011, Behavior research methods.
[70] Wei Liu,et al. Testing Statistical Hypotheses of Equivalence and Noninferiority, 2nd edn by Stefan Wellek , 2011 .
[71] E. Wagenmakers,et al. Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.
[72] F. Godlee,et al. Wakefield’s article linking MMR vaccine and autism was fraudulent , 2011, BMJ : British Medical Journal.
[73] B. Deer,et al. How the case against the MMR vaccine was fixed , 2011, BMJ : British Medical Journal.
[74] Yoav Benjamini,et al. Simultaneous and selective inference: Current successes and future challenges , 2010, Biometrical journal. Biometrische Zeitschrift.
[75] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.
[76] Peter J Diggle,et al. Embracing the concept of reproducible research. , 2010, Biostatistics.
[77] Niels Keiding,et al. Reproducible research and the substantive context. , 2010, Biostatistics.
[78] S. Wellek. Testing Statistical Hypotheses of Equivalence and Noninferiority , 2010 .
[79] Douglas G Altman,et al. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. , 2010, JAMA.
[80] E. Boersma,et al. Prevention of Catheter-Related Bacteremia with a Daily Ethanol Lock in Patients with Tunnelled Catheters: A Randomized, Placebo-Controlled Trial , 2010, PloS one.
[81] D. Fanelli. Do Pressures to Publish Increase Scientists' Bias? An Empirical Support from US States Data , 2010, PloS one.
[82] Maarten H. P. Ambaum,et al. Significance Tests in Climate Science , 2010, 1003.2934.
[83] Matko Marušić,et al. Can Teaching Research Methodology Influence Students' Attitude Toward Science? Cohort Study and Nonrandomized Trial in a Single Medical School , 2010, Journal of Investigative Medicine.
[84] L. Hedges,et al. The Handbook of Research Synthesis and Meta-Analysis , 2009 .
[85] Michael B. Miller,et al. The principled control of false positives in neuroimaging. , 2009, Social cognitive and affective neuroscience.
[86] Steven A Greenberg,et al. How citation distortions create unfounded authority: analysis of a citation network , 2009, BMJ : British Medical Journal.
[87] Roger D Peng,et al. Reproducible research and Biostatistics. , 2009, Biostatistics.
[88] Patrick Onghena,et al. How Confident are Students in their Misconceptions about Hypothesis Tests? , 2009 .
[89] H. Pashler,et al. Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition 1 , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.
[90] W. K. Simmons,et al. Circular analysis in systems neuroscience: the dangers of double dipping , 2009, Nature Neuroscience.
[91] A. Meysamie,et al. Teaching critical appraisal and statistics in anesthesia journal club. , 2008, QJM : monthly journal of the Association of Physicians.
[92] Olle Häggström,et al. The Cult of Statistical Significance , 2009 .
[93] Andrew Gelman,et al. Why We (Usually) Don't Have to Worry About Multiple Comparisons , 2009, 0907.2478.
[94] J. Ioannidis. Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.
[95] D. Murdoch,et al. P-Values are Random Variables , 2008 .
[96] S. Goodman. A dirty dozen: twelve p-value misconceptions. , 2008, Seminars in hematology.
[97] E. Wagenmakers. A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.
[98] J. Harnad. Trouble with Physics , 2007, 0709.1728.
[99] P. Donnelly,et al. Replicating genotype–phenotype associations , 2007, Nature.
[100] J. Ioannidis,et al. An exploratory test for an excess of significant findings , 2007, Clinical trials.
[101] S. Goodman,et al. Reproducible Research: Moving toward Research the Public Can Really Trust , 2007, Annals of Internal Medicine.
[102] George Liberopoulos,et al. Selection in Reported Epidemiological Risks: An Empirical Assessment , 2007, PLoS medicine.
[103] Wim Van Den Noortgate,et al. Students’ misconceptions of statistical inference: A review of the empirical evidence from research on statistics education , 2007 .
[104] R. Poldrack. Can cognitive processes be inferred from neuroimaging data? , 2006, Trends in Cognitive Sciences.
[105] J. Ioannidis. Why Most Published Research Findings Are False , 2005, PLoS medicine.
[106] Gerd Gigerenzer,et al. “A 30% Chance of Rain Tomorrow”: How Does the Public Understand Probabilistic Weather Forecasts? , 2005, Risk analysis : an official publication of the Society for Risk Analysis.
[107] I. Hozo,et al. Evaluation of new treatments in radiation oncology: are they better than standard treatments? , 2005, JAMA.
[108] Niels G. Waller,et al. The fallacy of the null hypothesis in soft psychology , 2004 .
[109] F. Roe,et al. The Empire , 2004, Calixtus II (1119-1124): A Pope Born to Rule.
[110] R. D. Rosenkrantz,et al. The significance test controversy , 1972, Synthese.
[111] David Kaplan,et al. The Sage handbook of quantitative methodology for the social sciences , 2004 .
[112] Gerd Gigerenzer,et al. Do Studies of Statistical Power Have an Effect on the Power of Studies? , 2004 .
[113] G. Gigerenzer. Mindless statistics , 2004 .
[114] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[115] G. Gigerenzer,et al. The null ritual : What you always wanted to know about significance testing but were afraid to ask , 2004 .
[116] Matko Marušić,et al. Teaching Students How to Read and Write Science: A Mandatory Course on Scientific Research and Communication in Medicine , 2003, Academic medicine : journal of the Association of American Medical Colleges.
[117] Thomas E. Nichols,et al. Controlling the familywise error rate in functional neuroimaging: a comparative review , 2003, Statistical methods in medical research.
[118] M. J. Bayarri,et al. Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing , 2003 .
[119] John D. Storey,et al. Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.
[120] M. Tribus,et al. Probability theory: the logic of science , 2003 .
[121] Jonathan A C Sterne,et al. Teaching hypothesis tests – time for significant change? , 2002, Statistics in medicine.
[122] C. Gluud,et al. Citation bias of hepato-biliary randomized clinical trials. , 2002, Journal of clinical epidemiology.
[123] N. Leech,et al. Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say? , 2002 .
[124] L.. HARKing: Hypothesizing After the Results are Known , 2002 .
[125] Y. Benjamini,et al. THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .
[126] G A Morgan,et al. Problems with null hypothesis significance testing. , 2001, Journal of the American Academy of Child and Adolescent Psychiatry.
[127] M. J. Bayarri,et al. Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .
[128] Jonathan A C Sterne,et al. Sifting the evidence—what's wrong with significance tests? , 2001, BMJ : British Medical Journal.
[129] D Curran-Everett,et al. Multiple comparisons: philosophies and illustrations. , 2000, American journal of physiology. Regulatory, integrative and comparative physiology.
[130] R. Nickerson,et al. Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.
[131] Y. Lee. An Empirical Assessment , 2000 .
[132] Francis Tuerlinckx,et al. Type S error rates for classical and Bayesian single and multiple comparison procedures , 2000, Comput. Stat..
[133] D. Krantz. The Null Hypothesis Testing Controversy in Psychology , 1999 .
[134] S. Goodman. Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.
[135] Gerd Gigerenzer. We need statistical thinking, not statistical rituals , 1998, Behavioral and Brain Sciences.
[136] M. Olson,et al. Misconceptions About Sample Size, Statistical Significance, and Treatment Effect , 1997 .
[137] W. Johnson,et al. A Bayesian perspective on the Bonferroni adjustment , 1997 .
[138] R T O'Neill,et al. The behavior of the P-value when the alternative hypothesis is true. , 1997, Biometrics.
[139] J. Hunter. Needed: A Ban on the Significance Test , 1997 .
[140] F. Schmidt. Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .
[141] R. Rosenthal,et al. Statistical power: concepts, procedures, and applications. , 1996, Behaviour research and therapy.
[142] Theodor D. Sterling,et al. Publication decisions revisited: the effect of the outcome of statistical tests on the decision to p , 1995 .
[143] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .
[144] Jacob Cohen. The earth is round (p < .05) , 1994 .
[145] D L DeMets,et al. Interim analysis: the alpha spending function approach. , 1994, Statistics in medicine.
[146] D. Moher,et al. Statistical power, sample size, and their reporting in randomized controlled trials. , 1994, JAMA.
[147] R. P. Carver. The Case Against Statistical Significance Testing, Revisited , 1993 .
[148] S. Goodman,et al. p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. , 1993, American journal of epidemiology.
[149] D. Lindley,et al. The Analysis of Experimental Data: The Appreciation of Tea and Wine , 1993 .
[150] Frank L. Schmidt,et al. What do data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. , 1992 .
[151] J. Rossi,et al. Statistical power of psychological research: what have we gained in 20 years? , 1990, Journal of consulting and clinical psychology.
[152] P. Meehl. Why Summaries of Research on Psychological Theories are Often Uninterpretable , 1990 .
[153] G. Guyatt,et al. Measurement of health status. Ascertaining the minimal clinically important difference. , 1989, Controlled clinical trials.
[154] G. Gigerenzer,et al. Do studies of statistical power have an effect on the power of studies , 1989 .
[155] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .
[156] R. Duncan Luce,et al. The Tools-to-Theory Hypothesis. , 1988 .
[157] Judea Pearl,et al. Probabilistic reasoning in intelligent systems , 1988 .
[158] J. Berger,et al. Testing Precise Hypotheses , 1987 .
[159] Jeanette G. Grasselli,et al. “On the Relative Motion of the Earth and the Luminiferous Ether” , 1987 .
[160] P. Pollard,et al. On the probability of making Type I errors. , 1987 .
[161] J. Berger,et al. Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .
[162] G. Gigerenzer,et al. Cognition as Intuitive Statistics , 1987 .
[163] M. Oakes. Statistical Inference: A Commentary for the Social and Behavioural Sciences , 1986 .
[164] James O. Berger,et al. Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .
[165] P. Meehl. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. , 1978 .
[166] Warren J. Ewens,et al. Likelihood: An account of the statistical concept of likelihood and its application to scientific inference. , 1973 .
[167] D. Lykken. Statistical significance in psychological research. , 1968, Psychological bulletin.
[168] P. Meehl. Theory-Testing in Psychology and Physics: A Methodological Paradox , 1967, Philosophy of Science.
[169] D. Bakan,et al. The test of significance in psychological research. , 1966, Psychological bulletin.
[170] The British Journal for the Philosophy of Science , 1957, Nature.
[171] Cherry Ann Clark. Chapter I: Hypothesis Testing in Relation to Statistical Methodology , 1963 .
[172] Cherry Ann Clark. Hypothesis Testing in Relation to Statistical Methodology , 1963 .
[173] Jacob Cohen,et al. The statistical power of abnormal-social psychological research: a review. , 1962, Journal of abnormal and social psychology.
[174] Jum C. Nunnally,et al. The Place of Statistics in Psychology , 1960 .
[175] W. W. Rozeboom. The fallacy of the null-hypothesis significance test. , 1960, Psychological bulletin.
[176] H. Eysenck,et al. The concept of statistical significance and the controversy about one-tailed tests. , 1960, Psychological review.
[177] Walter L. Smith. Probability and Statistics , 1959, Nature.
[178] T. Sterling. Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance—or Vice Versa , 1959 .
[179] M. S. Bartlett,et al. Statistical methods and scientific inference. , 1957 .
[180] H. B. Webb. The measurement of health. , 1956, A.M.A. archives of industrial health.
[181] Frank Yates,et al. The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics , 1951 .
[182] Taylor Francis Online,et al. The American statistician , 1947 .
[183] Joseph Berkson,et al. Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test , 1938 .
[184] M. Kendall. Statistical Methods for Research Workers , 1937, Nature.
[185] E. S. Pearson,et al. On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .
[186] L. M. M.-T.. Theory of Probability , 1929, Nature.
[187] Roland P. Falkner,et al. History of statistics , 1891 .
[188] A. Michelson,et al. On the relative motion of the Earth and the luminiferous ether , 1887, American Journal of Science.