Likelihood-based confidence intervals for the risk ratio using double sampling with over-reported binary data

In this article we derive likelihood-based confidence intervals for the risk ratio using over-reported two-sample binary data obtained using a double-sampling scheme. The risk ratio is defined as the ratio of two proportion parameters. By maximizing the full likelihood function, we obtain closed-form maximum likelihood estimators for all model parameters. In addition, we derive four confidence intervals: a naive Wald interval, a modified Wald interval, a Fieller-type interval, and an Agresti-Coull interval. All four confidence intervals are illustrated using cervical cancer data. Finally, we conduct simulation studies to assess and compare the coverage probabilities and average lengths of the four interval estimators. We conclude that the modified Wald interval, unlike the other three intervals, produces close-to-nominal confidence intervals under various simulation scenarios examined here and, therefore, is preferred in practice.

[1]  Robert L. Winkler,et al.  Implications of errors in survey data: a Bayesian model , 1992 .

[2]  E. C. Fieller SOME PROBLEMS IN INTERVAL ESTIMATION , 1954 .

[3]  N. Vakil,et al.  Admixture with whole blood does not explain false-negative urease tests. , 2000, Journal of Clinical Gastroenterology.

[4]  Dean M. Young,et al.  Confidence intervals for a binomial parameter based on binary data subject to false-positive misclassification , 2006, Comput. Stat. Data Anal..

[5]  M. Viana,et al.  Bayesian analysis of prevalence from the results of small screening samples , 1993 .

[6]  L.W.G. Strijbosch,et al.  Repeated audit controls , 2000 .

[7]  Moyses Szklo,et al.  Herpes simplex virus type 2: A possible interaction with human papillomavirus types 16/18 in the development of invasive cervical cancer , 1991, International journal of cancer.

[8]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[9]  A. Tenenbein A Double Sampling Scheme for Estimating from Binomial Data with Misclassifications , 1970 .

[10]  Yosef Hochberg,et al.  On the Use of Double Sampling Schemes in Analyzing Categorical Data with Misclassification Errors , 1977 .

[11]  Michael Evans,et al.  Bayesian Analysis of Binary Data Subject to Misclassification , 1996 .

[12]  P Gustafson,et al.  Case–Control Analysis with Partial Knowledge of Exposure Misclassification Probabilities , 2001, Biometrics.

[13]  I Heuch,et al.  Maximum likelihood estimation of the proportion of congenital malformations using double registration systems. , 1994, Biometrics.

[14]  I. Bross Misclassification in 2 X 2 Tables , 1954 .

[15]  R. T. Lie,et al.  Birth Defects Registered by Double Sampling: A Bayesian Approach Incorporating Covariates and Model Uncertainty , 1995 .

[16]  Dean M. Young,et al.  Credible sets for risk ratios in over-reported two-sample binomial data using the double-sampling scheme , 2010, Comput. Stat. Data Anal..

[17]  K. Chaloner,et al.  Bayesian analysis in statistics and econometrics : essays in honor of Arnold Zellner , 1996 .

[18]  Seung-Chun Lee,et al.  A Bayesian approach to obtain confidence intervals for binomial proportion in a double sampling scheme subject to false-positive misclassification ☆ , 2008 .

[19]  A. Tenenbein A Double Sampling Scheme for Estimating from Misclassified Multinomial Data with Applications to Sampling Inspection , 1972 .