A SAS Package for Logistic Two-Phase Studies

Two-phase designs, in which for a large study a dichotomous outcome and partial or proxy information on risk factors is available, whereas precise or complete measurements on covariates have been obtained only in a stratified sub-sample, extend the standard case-control design and have been proven useful in practice. The application of two-phase designs, however, seems to be hampered by the lack of appropriate, easy-to-use software. This paper introduces sas-twophase-package, a collection of SAS-macros, to fulfill this task. sas-twophase-package implements weighted likelihood, pseudo likelihood and semi- parametric maximum likelihood estimation via the EM algorithm and via profile likelihood in two-phase settings with dichotomous outcome and a given stratification.

[1]  I. Pigeot,et al.  Does additional confounder information alter the estimated risk of bleeding associated with phenprocoumon use—results of a two‐phase study , 2012, Pharmacoepidemiology and drug safety.

[2]  N. Breslow Case–Control Study, Two-Phase † , 2014 .

[3]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[4]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[5]  Iris Pigeot,et al.  A planning tool for two-phase case-control studies , 2007, Comput. Methods Programs Biomed..

[6]  A. Scott,et al.  On the Breslow–Holubkov estimator , 2007, Lifetime data analysis.

[7]  Mitchell H. Gail,et al.  Case-Control Studies With Errors in Covariates , 1993 .

[8]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[9]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[10]  W. Ahrens,et al.  Asbestos fibreyears and lung cancer: a two phase case–control study with expert exposure assessment , 2002, Occupational and environmental medicine.

[11]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[12]  Thomas Lumley,et al.  osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies. , 2011, Journal of statistical software.

[13]  Karl-Heinz Jöckel,et al.  Logistic analysis in case-control studies under validation sampling , 1993 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[16]  W Schill,et al.  Logistic analysis of studies with two-stage sampling: a comparison of four approaches. , 1997, Statistics in medicine.

[17]  Alastair Scott,et al.  Calculating efficient semiparametric estimators for a broad class of missing-data problems , 2006 .

[18]  Thomas Lumley,et al.  Analysis of Complex Survey Samples , 2004 .

[19]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .