Development, Validation, and Dissemination of a Breast Cancer Recurrence Detection and Timing Informatics Algorithm

Background This study developed, validated, and disseminated a generalizable informatics algorithm for detecting breast cancer recurrence and timing using a gold standard measure of recurrence coupled with data derived from a readily available common data model that pools health insurance claims and electronic health records data. Methods The algorithm has two parts: to detect the presence of recurrence and to estimate the timing of recurrence. The primary data source was the Cancer Research Network Virtual Data Warehouse (VDW). Sixteen potential indicators of recurrence were considered for model development. The final recurrence detection and timing models were determined, respectively, by maximizing the area under the ROC curve (AUROC) and minimizing average absolute error. Detection and timing algorithms were validated using VDW data in comparison with a gold standard recurrence capture from a third site in which recurrences were validated through chart review. Performance of this algorithm, stratified by stage at diagnosis, was compared with other published algorithms. All statistical tests were two-sided. Results Detection model AUROCs were 0.939 (95% confidence interval [CI] = 0.917 to 0.955) in the training data set (n = 3370) and 0.956 (95% CI = 0.944 to 0.971) and 0.900 (95% CI = 0.872 to 0.928), respectively, in the two validation data sets (n = 3370 and 3961, respectively). Timing models yielded average absolute prediction errors of 12.6% (95% CI = 10.5% to 14.5%) in the training data and 11.7% (95% CI = 9.9% to 13.5%) and 10.8% (95% CI = 9.6% to 12.2%) in the validation data sets, respectively, and were statistically significantly lower by 12.6% (95% CI = 8.8% to 16.5%, P < .001) than those estimated using previously reported timing algorithms. Similar covariates were included in both detection and timing algorithms but differed substantially from previous studies. Conclusions Valid and reliable detection of recurrence using data derived from electronic medical records and insurance claims is feasible. These tools will enable extensive, novel research on quality, effectiveness, and outcomes for breast cancer patients and those who develop recurrence.

[1]  Nikki M. Carroll,et al.  Detecting Lung and Colorectal Cancer Recurrence Using Structured Clinical/Administrative Data to Enable Outcomes Research and Population Health Management , 2017, Medical care.

[2]  Rebecca A Hubbard,et al.  An Electronic Health Record-based Algorithm to Ascertain the Date of Second Breast Cancer Events. , 2017, Medical care.

[3]  Ruth Etzioni,et al.  Estimation of the Number of Women Living with Metastatic Breast Cancer in the United States , 2017, Cancer Epidemiology, Biomarkers & Prevention.

[4]  Wenxia Sun,et al.  Systematic review of ixabepilone for treating metastatic breast cancer , 2017, Breast Cancer.

[5]  Yanqi Huang,et al.  Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  A. Bardia,et al.  Metastatic Breast Cancer With ESR1 Mutation: Clinical Management Considerations From the Molecular and Precision Medicine (MAP) Tumor Board at Massachusetts General Hospital , 2016, The oncologist.

[7]  Angela Mariotto,et al.  Sensitivity of Medicare Claims to Identify Cancer Recurrence in Elderly Colorectal and Breast Cancer Patients , 2016, Medical care.

[8]  Melanie Davies,et al.  Software-Enabled Distributed Network Governance: The PopMedNet Experience , 2016, EGEMS.

[9]  Jessica Chubak,et al.  Enhancing Breast Cancer Recurrence Algorithms Through Selective Use of Medical Record Data. , 2016, Journal of the National Cancer Institute.

[10]  J. Warren,et al.  Challenges and opportunities in measuring cancer recurrence in the United States. , 2015, Journal of the National Cancer Institute.

[11]  Joel D Kallich,et al.  An Evaluation of Algorithms for Identifying Metastatic Breast, Lung, or Colorectal Cancer in Administrative Claims Data , 2015, Medical care.

[12]  Marilyn L Kwan,et al.  A Hybrid Approach to Identify Subsequent Breast Cancer Using Pathology and Automated Health Information Data , 2015, Medical care.

[13]  Nikki M. Carroll,et al.  Validating Billing/Encounter Codes as Indicators of Lung, Colorectal, Breast, and Prostate Cancer Recurrence Using 2 Large Contemporary Cohorts , 2014, Medical care.

[14]  Roy Pardee,et al.  The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration , 2014, EGEMS.

[15]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[16]  M. Bonafede,et al.  Patterns of treatment, healthcare utilization and costs by lines of therapy in metastatic breast cancer in a large insured US population. , 2013, Journal of comparative effectiveness research.

[17]  Justin A. Strauss,et al.  Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm , 2012, J. Am. Medical Informatics Assoc..

[18]  Jessica Chubak,et al.  Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. , 2012, Journal of the National Cancer Institute.

[19]  M. Stolar,et al.  Identification of metastatic cancer in claims data , 2012, Pharmacoepidemiology and drug safety.

[20]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[21]  L. Vahdat,et al.  Treatment of metastatic breast cancer: second line and beyond. , 2011, Annals of oncology : official journal of the European Society for Medical Oncology.

[22]  A. Neugut,et al.  Quality of life among women recently diagnosed with invasive breast cancer: the Pathways Study , 2010, Breast Cancer Research and Treatment.

[23]  C. Carroll,et al.  The clinical efficacy of cytotoxic agents in locally advanced or metastatic breast cancer patients pretreated with an anthracycline and a taxane: a systematic review. , 2009, European journal of cancer.

[24]  B. Sternfeld,et al.  The Pathways Study: a prospective study of breast cancer survivorship within Kaiser Permanente Northern California , 2008, Cancer Causes & Control.

[25]  C. Shapiro,et al.  Surviving recurrence: Psychological and quality‐of‐life recovery , 2008, Cancer.

[26]  T. Lash,et al.  Recurrences and second primary breast cancers in older women with initial early‐stage disease , 2007, Cancer.

[27]  Nicholas A. Christakis,et al.  Measuring Disease-Free Survival and Cancer Relapse Using Medicare Claims From CALGB Breast Cancer Trial Participants (Companion to 9344) , 2006, Journal of the National Cancer Institute.

[28]  Sarah M. Greene,et al.  Building a virtual cancer research organization. , 2005, Journal of the National Cancer Institute. Monographs.

[29]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[30]  Kathleen Lang,et al.  Identifying Cancer Relapse Using SEER-Medicare Data , 2002, Medical care.

[31]  A. Jemal,et al.  Cancer statistics, 2016 , 2016, CA: a cancer journal for clinicians.

[32]  E. Hing,et al.  Use and characteristics of electronic health record systems among office-based physician practices: United States, 2001-2013. , 2014, NCHS data brief.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .