osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies.

The two-phase design has recently received attention in the statistical literature as an extension to the traditional case-control study for settings where a predictor of interest is rare or subject to missclassification. Despite a thorough methodological treatment and the potential for substantial efficiency gains, the two-phase design has not been widely adopted. This may be due, in part, to a lack of general-purpose, readily-available software. The osDesign package for R provides a suite of functions for analyzing data from a two-phase and/or case-control design, as well as evaluating operating characteristics, including bias, efficiency and power. The evaluation is simulation-based, permitting flexible application of the package to a broad range of scientific settings. Using lung cancer mortality data from Ohio, the package is illustrated with a detailed case-study in which two statistical goals are considered: (i) the evaluation of small-sample operating characteristics for two-phase and case-control designs and (ii) the planning and design of a future two-phase study.

[1]  N. E. Breslow Statistical Methods in Cancer Research , 1986 .

[2]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[3]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[4]  B P Carlin,et al.  Spatio-temporal models with errors in covariates: mapping Ohio lung cancer mortality. , 1998, Statistics in medicine.

[5]  M Reilly,et al.  Optimal sampling strategies for two-stage studies. , 1996, American journal of epidemiology.

[6]  D. Oakes,et al.  Survival Times: Aspects of Partial Likelihood , 1981 .

[7]  Alaattin Erkanli,et al.  Optimal Bayesian two-phase designs , 1998 .

[8]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[9]  N E Breslow,et al.  Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data. , 1997, Statistics in medicine.

[10]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[11]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[12]  Agus Salim,et al.  Designing optimal two-stage epidemiological studies , 2005 .

[13]  N. Breslow,et al.  Statistical methods in cancer research. Vol. 1. The analysis of case-control studies. , 1981 .

[14]  N. Breslow,et al.  The analysis of case-control studies , 1980 .

[15]  Weng Kee Wong,et al.  Applied Optimal Designs , 2006 .

[16]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[17]  James A Hanley,et al.  Two-stage case-control studies: precision of parameter estimates and considerations in selecting sample size. , 2005, American journal of epidemiology.

[18]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[19]  Bryan Langholz,et al.  Counter-matching: A stratified nested case-control sampling method , 1995 .

[20]  P E Shrout,et al.  Design of two-phase prevalence surveys of rare disorders. , 1989, Biometrics.

[21]  S. Haneuse,et al.  On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses , 2009, The American statistician.

[22]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[23]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[24]  R. McNamee Optimal design and efficiency of two-phase case-control studies with error-prone and error-free exposure measures. , 2005, Biostatistics.

[25]  J. Neyman Contribution to the Theory of Sampling Human Populations , 1938 .

[26]  Joseph Sedransk,et al.  Optimal Two-Phase Stratified Sampling for Estimation of the Age Composition of a Fish Population , 1987 .

[27]  N. Breslow,et al.  Statistical methods in cancer research: volume 1- The analysis of case-control studies , 1980 .

[28]  D Schaubel,et al.  Two-stage sampling for etiologic studies. Sample size and power. , 1997, American journal of epidemiology.

[29]  Iris Pigeot,et al.  A planning tool for two-phase case-control studies , 2007, Comput. Methods Programs Biomed..

[30]  G. Belle Statistical rules of thumb , 2002 .

[31]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .