Combining primary cohort data with external aggregate information without assuming comparability

In comparative effectiveness research (CER) for rare types of cancer, it is appealing to combine primary cohort data containing detailed tumor profiles together with aggregate information derived from cancer registry databases. Such integration of data may improve statistical efficiency in CER. A major challenge in combining information from different resources however, is that the aggregate information from the cancer registry databases could be incomparable with the primary cohort data, which are often collected from a single cancer center or a clinical trial. We develop an adaptive estimation procedure, which uses the combined information to determine the degree of information borrowing from the aggregate data of the external resource. We establish the asymptotic properties of the estimators and evaluate the finite sample performance via simulation studies. The proposed method yields a substantial gain in statistical efficiency over the conventional method using the primary cohort only, and avoids undesirable biases when the given external information is incomparable to the primary cohort. We apply the proposed method to evaluate the long-term effect of trimodality treatment to inflammatory breast cancer (IBC) by tumor subtypes, while combining the IBC patient cohort at The University of Texas MD Anderson Cancer Center and the external aggregate information from the National Cancer Data Base (NCDB).

[1]  G. Hortobagyi,et al.  Long-term treatment efficacy in primary inflammatory breast cancer by hormonal receptor- and HER2-defined subtypes. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[2]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[3]  G. Lyman,et al.  Comparative effectiveness research in oncology: an overview. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[5]  L. Baddour,et al.  Influence of referral bias on the clinical characteristics of patients with Gram-negative bloodstream infection , 2011, Epidemiology and Infection.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  R. Carroll,et al.  Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-Level Information From External Big Data Sources , 2016, Journal of the American Statistical Association.

[8]  Kai Yu,et al.  Using covariate-specific disease prevalence information to increase the power of case-control studies , 2015 .

[9]  Ziqi Chen,et al.  New Robust Variable Selection Methods for Linear Regression Models , 2014 .

[10]  I. Šestak,et al.  IHC4 score plus clinical treatment score predicts locoregional recurrence in early breast cancer. , 2016, Breast.

[11]  Isabelle Bedrosian,et al.  Underuse of trimodality treatment affects survival for patients with inflammatory breast cancer: an analysis of treatment and survival trends from the National Cancer Database. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[13]  D. Cox Regression Models and Life-Tables , 1972 .

[14]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[15]  G. Imbens,et al.  Combining Micro and Macro Data in Microeconometric Models , 1994 .

[16]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[17]  J. Qin,et al.  Miscellanea. Combining parametric and empirical likelihoods , 2000 .

[18]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[19]  R. Schilsky,et al.  Randomized controlled trials and comparative effectiveness research. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  John D. Kalbfleisch,et al.  Misspecified proportional hazard models , 1986 .

[21]  Melissa Bondy,et al.  Inflammatory Breast Cancer: The Disease, the Biology, the Treatment , 2010, CA: a cancer journal for clinicians.

[22]  C. Ko,et al.  Using the NCDB for cancer care improvement: An introduction to available quality assessment tools , 2009, Journal of surgical oncology.

[23]  Runze Li,et al.  NEW EFFICIENT ESTIMATION AND VARIABLE SELECTION METHODS FOR SEMIPARAMETRIC VARYING-COEFFICIENT PARTIALLY LINEAR MODELS. , 2011, Annals of statistics.

[24]  Huei-Ting Tsai,et al.  Efficient Estimation of the Cox Model with Auxiliary Subgroup Survival Information , 2016, Journal of the American Statistical Association.