Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures

Many statistical studies report p-values for inferential purposes. In several scenarios, the stochastic aspect of p-values is neglected, which may contribute to drawing wrong conclusions in real data experiments. The stochastic nature of p-values makes their use to examine the performance of given testing procedures or associations between investigated factors to be difficult. We turn our focus on the modern statistical literature to address the expected p-value (EPV) as a measure of the performance of decision-making rules. During the course of our study, we prove that the EPV can be considered in the context of receiver operating characteristic (ROC) curve analysis, a well-established biostatistical methodology. The ROC-based framework provides a new and efficient methodology for investigating and constructing statistical decision-making procedures, including: (1) evaluation and visualization of properties of the testing mechanisms, considering, e.g. partial EPVs; (2) developing optimal tests via the minimization of EPVs; (3) creation of novel methods for optimally combining multiple test statistics. We demonstrate that the proposed EPV-based approach allows us to maximize the integrated power of testing algorithms with respect to various significance levels. In an application, we use the proposed method to construct the optimal test and analyze a myocardial infarction disease dataset. We outline the usefulness of the “EPV/ROC” technique for evaluating different decision-making procedures, their constructions and properties with an eye towards practical applications.

[1]  Xiwei Chen,et al.  Statistical Testing Strategies in the Health Sciences , 2016 .

[2]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[3]  Xiwei Chen,et al.  Empirical likelihood ratio confidence interval estimation of best linear combinations of biomarkers , 2015, Comput. Stat. Data Anal..

[4]  L. Lazzeroni,et al.  P-values in genomics: Apparent precision masks high uncertainty , 2014, Molecular Psychiatry.

[5]  Howard Rockette,et al.  Statistical Evaluation of Diagnostic Performance: Topics in Roc Analysis , 2011 .

[6]  Chunling Liu,et al.  A min–max combination of biomarkers to improve diagnostic accuracy , 2011, Statistics in medicine.

[7]  Gonzalo Durán Pacheco,et al.  Multiple Testing Problems in Pharmaceutical Statistics , 2009 .

[8]  D. W. Zimmerman,et al.  Hazards in Choosing Between Pooled and Separate Variances t Tests , 2009 .

[9]  Enrique F Schisterman,et al.  Maximum Likelihood Ratio Tests for Comparing the Discriminatory Ability of Biomarkers Subject to Limit of Detection , 2008, Biometrics.

[10]  Enrique F Schisterman,et al.  Estimation of ROC curves based on stably distributed biomarkers subject to measurement error and pooling mixtures , 2008, Statistics in medicine.

[11]  Enrique F Schisterman,et al.  On linear combinations of biomarkers to improve diagnostic accuracy , 2005, Statistics in medicine.

[12]  Enrique F. Schisterman,et al.  Comparison of Diagnostic Accuracy of Biomarkers With Pooled Assessments , 2003 .

[13]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[14]  E F Schisterman,et al.  Minimal and best linear combination of oxidative stress and antioxidant biomarkers to discriminate cardiovascular disease. , 2002, Nutrition, metabolism, and cardiovascular diseases : NMCD.

[15]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[16]  E F Schisterman,et al.  TBARS and Cardiovascular Disease in a Population-Based Sample , 2001, Journal of cardiovascular risk.

[17]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[18]  M. Pepe,et al.  Combining diagnostic test results to increase accuracy. , 2000, Biostatistics.

[19]  E. Samuel-Cahn,et al.  P Values as Random Variables—Expected P Values , 1999 .

[20]  Jun S. Liu,et al.  Linear Combinations of Multiple Diagnostic Markers , 1993 .

[21]  Stephen M. Stigler,et al.  The History of Statistics: The Measurement of Uncertainty before 1900 by Stephen M. Stigler (review) , 1986, Technology and Culture.

[22]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[23]  M. Schatzoff Sensitivity Comparisons among Tests of the General Linear Hypothesis , 1966 .

[24]  Martin Schatzoff,et al.  Expected Significance Level as a Sensitivity Index for Test Statistics , 1965 .

[25]  S. N. Roy On a Heuristic Method of Test Construction and its use in Multivariate Analysis , 1953 .

[26]  S. Julious Why do we use pooled variance analysis of variance? , 2005 .

[27]  T. Ferguson Asymptotic Joint Distribution of Sample Mean and a Sample Quantile , 2001 .

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[30]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .