Efficient logistic regression designs under an imperfect population identifier.

Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial.

[1]  S L Hui,et al.  A general approach to analyzing epidemiologic data that contain misclassification errors. , 1987, Biometrics.

[2]  A. Kuk,et al.  Recursive subsetting to identify patients in the STAR*D: a method to enhance the accuracy of early prediction of treatment outcome and to inform personalized care. , 2010, The Journal of clinical psychiatry.

[3]  Michael O'Grady,et al.  Continuous glucose monitoring and intensive treatment of type 1 diabetes. , 2008, The New England journal of medicine.

[4]  P. Sollero Book reviewMathematica: A system for doing mathematics by computer, 2nd edition: Stephen Wolfran, Addison-Wesley, 1991. pp. 992, hardback. £40.45. ISBN: 0 201 51502 4 , 1992 .

[5]  F. Ovalle,et al.  Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes , 2009 .

[6]  Mark J van der Laan,et al.  The International Journal of Biostatistics A Targeted Maximum Likelihood Estimator for Two-Stage Designs , 2011 .

[7]  H Brenner,et al.  Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. , 1993, American journal of epidemiology.

[8]  D Spiegelman,et al.  Matrix Methods for Estimating Odds Ratios with Misclassified Exposure Data: Extensions and Comparisons , 1999, Biometrics.

[9]  Ronald A. Thisted,et al.  Elements of Statistical Computing: Numerical Computation. , 1991 .

[10]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[11]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[12]  Stephen Wolfram,et al.  Mathematica: a system for doing mathematics by computer (2nd ed.) , 1991 .

[13]  Paul S Albert,et al.  The effect of strict adherence to a high-fiber, high-fruit and -vegetable, and low-fat eating pattern on adenoma recurrence. , 2009, American journal of epidemiology.

[14]  J. Neyman Contribution to the Theory of Sampling Human Populations , 1938 .

[15]  C. F. Wu,et al.  Optimal designs for binary response experiments: Fieller, D, and A criteria , 1993 .

[16]  Thomas Mathew,et al.  Optimal designs for binary data under logistic regression , 2001 .

[17]  Dean A. Follmann,et al.  On the Effect of Treatment among Would-Be Treatment Compliers: An Analysis of the Multiple Risk Factor Intervention Trial , 2000 .

[18]  Margaret Sullivan Pepe,et al.  Estimating disease prevalence in two-phase studies. , 2003, Biostatistics.

[19]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[20]  Emmanuel Lesaffre,et al.  A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX , 2006, Biometrics.

[21]  R. Iannotti,et al.  Clinic-Integrated Behavioral Intervention for Families of Youth With Type 1 Diabetes: Randomized Clinical Trial , 2012, Pediatrics.

[22]  S. Duffy,et al.  A simple model for potential use with a misclassified binary outcome in epidemiology , 2004, Journal of Epidemiology and Community Health.

[23]  Margaret S. Pepe,et al.  Assessing accuracy of a continuous screening test in the presence of verification bias , 2005 .