TwoPhaseInd: an R package for estimating gene-treatment interactions and discovering predictive markers in randomized clinical trials

In randomized clinical trials, identifying baseline genetic or genomic markers for predicting subgroup treatment effects is of rising interest. Outcome-dependent sampling is often employed for measuring markers. The R package TwoPhaseInd implements a number of efficient statistical methods we developed for estimating subgroup treatment effects and gene-treatment interactions, exploiting the gene-treatment independence dictated by randomization, including the case-only estimator, the maximum estimated likelihood estimator and the semiparametric maximum likelihood estimator for parameters in a logistic model. For rare failure events subject to censoring, we have proposed efficient augmented case-only designs, a variation of the case-cohort design, to estimate genetic associations and subgroup treatment effects in a Cox regression model. The R package is computationally scalable to genome-wide studies, as illustrated by an example from Women's Health Initiative. AVAILABILITY AND IMPLEMENTATION The R package TwoPhaseInd is available from http://cran.r-project.org/web/packages CONTACT: jdai@fredhutch.org.