Estimating the area under a receiver operating characteristic curve using partially ordered sets

Abstract Ranked set sampling (RSS), known as a cost-effective sampling technique, requires that the ranker gives a complete ranking of the units in each set. Frey (2012) proposed a modification of RSS based on partially ordered sets, referred to as RSS-t in this paper, to allow the ranker to declare ties as much as he/she wishes. We consider the problem of estimating the area under a receiver operating characteristics (ROC) curve using RSS-t samples. The area under the ROC curve (AUC) is commonly used as a measure for the effectiveness of diagnostic markers. We develop six nonparametric estimators of the AUC with/without utilizing tie information based on different approaches. We then compare the estimators using a Monte Carlo simulation and an empirical study with real data from the National Health and Nutrition Examination Survey. The results show that utilizing tie information increases the efficiency of estimating the AUC. Suggestions about when to choose which estimator are also made available to practitioners.

[1]  J. L. Clutter,et al.  Ranked Set Sampling Theory with Order Statistics Background , 1972 .

[2]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[3]  Susan C. Jones,et al.  Abundance, Distribution, and Colony Size Estimates for Reticulitermes spp. (Isoptera: Rhinotermitidae) in Southern Mississippi , 1982 .

[4]  T. Sager,et al.  Characterization of a Ranked-Set Sample with Application to Estimating Distribution Functions , 1988 .

[5]  Francisco J. Samaniego,et al.  Nonparametric Maximum Likelihood Estimation Based on Ranked Set Samples , 1994 .

[6]  Jian Huang Asymptotic properties of the NPMLE of a distribution function based on ranked set samples , 1997 .

[7]  B. Reiser,et al.  Estimation of the area under the ROC curve , 2002, Statistics in medicine.

[8]  S. Kotz,et al.  The stress-strength model and its generalizations : theory and applications , 2003 .

[9]  Haiying Chen,et al.  Ranked set sampling for efficient estimation of a population proportion. , 2005, Statistics in medicine.

[10]  Steven N. MacEachern,et al.  Nonparametric Two-Sample Methods for Ranked-Set Sample Data , 2006 .

[11]  Lynne Stokes,et al.  A nonparametric mean estimator for judgment poststratified data. , 2008, Biometrics.

[12]  S. Sengupta,et al.  Unbiased estimation of P(X>Y) using ranked set sample data , 2008 .

[13]  Jesse Frey Nonparametric mean estimation using partially ordered sets , 2012, Environmental and Ecological Statistics.

[14]  Johan Lim,et al.  Isotonized CDF estimation from judgment poststratification data with empty strata. , 2012, Biometrics.

[15]  E. Moltchanova,et al.  Partial ranked set sampling design , 2013 .

[16]  Mohammad Jafari Jozani,et al.  Mixture Model Analysis of Partially Rank‐Ordered Set Samples: Age Groups of Fish from Length‐Frequency Data , 2015 .

[17]  H. Samawi,et al.  Rank-Based Kernel Estimation of the Area Under the ROC Curve , 2016 .

[18]  Lynne Stokes,et al.  Using Ranked Set Sampling With Cluster Randomized Designs for Improved Inference on Treatment Effects , 2016 .

[19]  Testing perfect rankings in ranked‐set sampling with binary data , 2017 .

[20]  Ehsan Zamanzade,et al.  A more efficient proportion estimator in ranked set sampling , 2017 .

[21]  Jesse Frey,et al.  Efficiency comparisons for partially rank-ordered set sampling , 2017 .

[22]  Johan Lim,et al.  Unbalanced ranked set sampling in cluster randomized studies , 2017 .

[23]  L. Dümbgen,et al.  Inference on a distribution function from ranked set samples , 2013, Annals of the Institute of Statistical Mathematics.

[24]  Xinlei Wang,et al.  Proportion estimation in ranked set sampling in the presence of tie information , 2018, Comput. Stat..

[25]  Ehsan Zamanzade,et al.  Using ranked set sampling with extreme ranks in estimating the population proportion , 2020, Statistical methods in medical research.

[26]  Improved Nonparametric Estimation Using Partially Ordered Sets , 2020 .