Classifying Lung Cancer Severity with Ensemble Machine Learning in Health Care Claims Data

Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (i) deploying ensemble machine learning for prediction, (ii) establishing a set of classification rules for the predicted probabilities, and (iii) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.

[1]  J. Warren,et al.  Challenges and opportunities in measuring cancer recurrence in the United States. , 2015, Journal of the National Cancer Institute.

[2]  A. Jemal,et al.  Cancer statistics, 2017 , 2017, CA: a cancer journal for clinicians.

[3]  J. Simeone,et al.  Validation of Claims Algorithms for Progression to Metastatic Cancer in Patients with Breast, Non-small Cell Lung, and Colorectal Cancer , 2016, Front. Oncol..

[4]  L. Kessler,et al.  Potential for Cancer Related Health Services Research Using a Linked Medicare‐Tumor Registry Database , 1993, Medical care.

[5]  G. Cooper,et al.  The utility of Medicare claims data for measuring cancer stage. , 1999, Medical care.

[6]  Neetu Chawla,et al.  Limited validity of diagnosis codes in Medicare claims for identifying cancer metastases and inferring stage. , 2014, Annals of epidemiology.

[7]  E. Feuer,et al.  SEER Cancer Statistics Review, 1975-2003 , 2006 .

[8]  Nikki M. Carroll,et al.  Validating Billing/Encounter Codes as Indicators of Lung, Colorectal, Breast, and Prostate Cancer Recurrence Using 2 Large Contemporary Cohorts , 2014, Medical care.

[9]  V. Moyer Screening for Lung Cancer: U.S. Preventive Services Task Force Recommendation Statement , 2014, Annals of Internal Medicine.

[10]  Joel D Kallich,et al.  An Evaluation of Algorithms for Identifying Metastatic Breast, Lung, or Colorectal Cancer in Administrative Claims Data , 2015, Medical care.

[11]  M. Stolar,et al.  Identification of metastatic cancer in claims data , 2012, Pharmacoepidemiology and drug safety.

[12]  Peng Guan,et al.  Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method , 2009, Journal of experimental & clinical cancer research : CR.

[13]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[14]  Mansour Ebrahimi,et al.  Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models , 2012, PloS one.