Predicting Lung Cancer Survival Using Probabilistic Reclassification of TNM Editions With a Bayesian Network

PURPOSE The TNM classification system is used for prognosis, treatment, and research. Regular updates potentially break backward compatibility. Reclassification is not always possible, is labor intensive, or requires additional data. We developed a Bayesian network (BN) for reclassifying the 5th, 6th, and 7th editions of the TNM and predicting survival for non–small-cell lung cancer (NSCLC) without training data with known classifications in multiple editions. METHODS Data were obtained from the Netherlands Cancer Registry (n = 146,084). A BN was designed with nodes for TNM edition and survival, and a group of nodes was designed for all TNM editions, with a group for edition 7 only. Before learning conditional probabilities, priors for relations between the groups were manually specified after analysis of changes between editions. For performance evaluation only, part of the 7th edition test data were manually reclassified. Performance was evaluated using sensitivity, specificity, and accuracy. Two-year survival was evaluated with the receiver operating characteristic area under the curve (AUC), and model calibration was visualized. RESULTS Manual reclassification of 7th to 6th edition stage group as ground truth for testing was impossible in 5.6% of the patients. Predicting 6th edition stage grouping using 7th edition data and vice versa resulted in average accuracies, sensitivities, and specificities between 0.85 and 0.99. The AUC for 2-year survival was 0.81. CONCLUSION We have successfully created a BN for reclassifying TNM stage grouping across TNM editions and predicting survival in NSCLC without knowing the true TNM classification in various editions in the training set. We suggest binary prediction of survival is less relevant than predicted probability and model calibration. For research, probabilities can be used for weighted reclassification.

[1]  J. Crowley,et al.  Should the 7th Edition of the Lung Cancer Stage Classification System Change Treatment Algorithms in Non-small Cell Lung Cancer? , 2010, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  J. Crowley,et al.  The IASLC Lung Cancer Staging Project: Proposals for the Revision of the TNM Stage Groupings in the Forthcoming (Seventh) Edition of the TNM Classification of Malignant Tumours , 2007, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[5]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[6]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[7]  L. Sobin,et al.  The Staging of Cancer: A Retrospective and Prospective Appraisal , 2008, CA: a cancer journal for clinicians.

[8]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[9]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[10]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.