Statistical significance: p value, 0.05 threshold, and applications to radiomics—reasons for a conservative approach

Here, we summarise the unresolved debate about p value and its dichotomisation. We present the statement of the American Statistical Association against the misuse of statistical significance as well as the proposals to abandon the use of p value and to reduce the significance threshold from 0.05 to 0.005. We highlight reasons for a conservative approach, as clinical research needs dichotomic answers to guide decision-making, in particular in the case of diagnostic imaging and interventional radiology. With a reduced p value threshold, the cost of research could increase while spontaneous research could be reduced. Secondary evidence from systematic reviews/meta-analyses, data sharing, and cost-effective analyses are better ways to mitigate the false discovery rate and lack of reproducibility associated with the use of the 0.05 threshold. Importantly, when reporting p values, authors should always provide the actual value, not only statements of “p < 0.05” or “p ≥ 0.05”, because p values give a measure of the degree of data compatibility with the null hypothesis. Notably, radiomics and big data, fuelled by the application of artificial intelligence, involve hundreds/thousands of tested features similarly to other “omics” such as genomics, where a reduction in the significance threshold, based on well-known corrections for multiple testing, has been already adopted.

[1]  Rory A. Fisher,et al.  Statistical Methods for Research Workers. , 1956 .

[2]  Scott B Going,et al.  Relationship of sedentary behavior and physical activity to incident cardiovascular disease: results from the Women's Health Initiative. , 2013, Journal of the American College of Cardiology.

[3]  A. Khera,et al.  2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: Executive Summary: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. , 2019, Journal of the American College of Cardiology.

[4]  M. Daly,et al.  Estimation of the multiple testing burden for genomewide association studies of nearly all common variants , 2008, Genetic epidemiology.

[5]  M. Lee,et al.  Statistical Evidence in Experimental Psychology , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[6]  Sander Greenland,et al.  Why Most Published Research Findings Are False: Problems in the Analysis , 2007, PLoS medicine.

[7]  Francesco Sardanelli,et al.  Biostatistics for Radiologists: Planning, Performing, and Writing a Radiologic Study , 2008 .

[8]  Anil Potti,et al.  Erratum: Genomic signatures to guide the use of chemotherapeutics (Nature Medicine (2006) 12 (1294-1300)) , 2011 .

[9]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[10]  Sander Greenland,et al.  Scientists rise up against statistical significance , 2019, Nature.

[11]  Jeffrey R. Smith,et al.  An Introduction to Second-Generation p-Values , 2019, The American Statistician.

[12]  Nicholas P. Holmes,et al.  Justify your alpha , 2018, Nature Human Behaviour.

[13]  E. Komaroff,et al.  A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use , 2019, The American Statistician.

[14]  H. Dressman,et al.  Retraction: Genomic signatures to guide the use of chemotherapeutics , 2011, Nature Medicine.

[15]  Joseph Berkson Tests of significance considered as evidence , 2003 .

[16]  K. Coombes,et al.  Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology , 2009, 1010.1092.

[17]  N. Houssami,et al.  To share or not to share? Expected pros and cons of data sharing in radiological research , 2018, European Radiology.

[18]  Jeffrey T. Leek,et al.  Statistics: P values are just the tip of the iceberg , 2015, Nature.

[19]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[20]  N. Lazar,et al.  Moving to a World Beyond “p < 0.05” , 2019, The American Statistician.

[21]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[22]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[23]  K J Rothman,et al.  That confounded P-value. , 1998, Epidemiology.

[24]  W. Niessen,et al.  Quantification of Heterogeneity as a Biomarker in Tumor Imaging: A Systematic Review , 2014, PloS one.

[25]  P. Marsden,et al.  False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review , 2015, PloS one.

[26]  H. Ferdowsian,et al.  The Ethical Challenges of Animal Research , 2015, Cambridge Quarterly of Healthcare Ethics.

[27]  W. McGuire,et al.  Why do so many prognostic factors fail to pan out? , 2005, Breast Cancer Research and Treatment.

[28]  F. Sardanelli,et al.  Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine , 2018, European Radiology Experimental.

[29]  Edwin G. Boring,et al.  Mathematical vs. scientific significance. , 1919 .

[30]  J. Krueger,et al.  The Heuristic Value of p in Inductive Statistical Inference , 2017, Front. Psychol..

[31]  A. Khera,et al.  2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: Executive Summary. , 2019, Circulation.

[32]  David M Herrington,et al.  Early menopause predicts future coronary heart disease and stroke: the Multi-Ethnic Study of Atherosclerosis , 2012, Menopause.

[33]  Shaun M. Purcell,et al.  Statistical power and significance testing in large-scale genetic studies , 2014, Nature Reviews Genetics.

[34]  C. Hölscher Editorial , 2014, Alzheimer's & Dementia.

[35]  W. Sauerbrei,et al.  Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. , 1994, Journal of the National Cancer Institute.

[36]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[37]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[38]  J. Ioannidis The Proposal to Lower P Value Thresholds to .005. , 2018, JAMA.

[39]  Hung T. Nguyen,et al.  Manipulating the Alpha Level Cannot Cure Significance Testing , 2017, Front. Psychol..

[40]  John P A Ioannidis,et al.  The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance. , 2019, JAMA.

[41]  T. Perneger,et al.  P < 5 × 10(-8) has emerged as a standard of statistical significance for genome-wide association studies. , 2015, Journal of clinical epidemiology.

[42]  Laura Cortesi,et al.  Multicenter Surveillance of Women at High Genetic Breast Cancer Risk Using Mammography, Ultrasonography, and Contrast-Enhanced Magnetic Resonance Imaging (the High Breast Cancer Risk Italian 1 Study): Final Results , 2011, Investigative radiology.

[43]  E. V. van Beek,et al.  Use of Coronary Computed Tomographic Angiography to Guide Management of Patients With Coronary Disease , 2016, Journal of the American College of Cardiology.