Persistence of data-driven knowledge to predict breast cancer survival

BACKGROUND Machine learning predictive models for breast cancer survival can improve if they are made specific to the stage of the cancer at the time of diagnosis. However, the relevance of the clinical parameters in that prediction, and the predictive quality of these models may change over time. OBJECTIVE To determine whether the findings on the influence of clinical parameters and the performance of machine learning models in the prediction of breast cancer survival have to be considered temporary or permanent, and if temporary what is the period of validity of the new generated knowledge. METHODS Fifteen recently published relevant conclusions on the application of machine learning methods to predict breast cancer survival were identified. Then, the data on breast cancer in the SEER database were used to construct several data-driven models over time to predict five-year survival of breast cancer. Three different machine learning methods were used. Stage-specific models and joint models for all the stages were considered. The predictive quality of the models and the importance of clinical parameters were subjected to a persistence analysis over time in order to determine the validity and durability of these fifteen conclusions. RESULTS AND CONCLUSIONS Only 53% of the conclusions were true for the SEER cases in 1988-2009, and only 75% of these were true over time. Relevant conclusions such as the impossibility to improve survival prediction of the most frequent stages with more data or the importance of the grade of the cancer to predict breast cancer survival of patients with distant metastasis turned to be false when subjected to a temporal analysis. Our study concludes that data-driven knowledge obtained with machine learning methods must be subject to over time validation before it can be clinically and professionally applied.

[1]  S. H. Cheng,et al.  Adherence to Quality Indicators and Survival in Patients With Breast Cancer , 2009, Medical care.

[2]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[3]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[4]  Azin Nahvijou,et al.  Development of a tool for comprehensive evaluation of population-based cancer registries , 2018, Int. J. Medical Informatics.

[5]  K. Hajian‐Tilaki,et al.  Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. , 2013, Caspian journal of internal medicine.

[6]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[9]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[10]  A. Beigzadeh,et al.  Machine learning models in breast cancer survival prediction. , 2016, Technology and health care : official journal of the European Society for Engineering and Medicine.

[11]  M. Sherman,et al.  The Surveillance, Epidemiology, and End Results (SEER) Program and Pathology: Toward Strengthening the Critical Relationship , 2016, The American journal of surgical pathology.

[12]  D. Berry,et al.  Effect of screening and adjuvant therapy on mortality from breast cancer , 2005 .

[13]  L. Nelson Sanchez-Pinto,et al.  Comparison of variable selection methods for clinical predictive modeling , 2018, Int. J. Medical Informatics.

[14]  Hyunjung Shin,et al.  Robust predictive model for evaluating breast cancer survivability , 2013, Eng. Appl. Artif. Intell..

[15]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[16]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[17]  Rohit J. Kate,et al.  Stage-specific predictive models for breast cancer survivability , 2017, Int. J. Medical Informatics.

[18]  H. Salehiniya,et al.  Incidence and Mortality and Epidemiology of Breast Cancer in the World. , 2016, Asian Pacific journal of cancer prevention : APJCP.

[19]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[20]  J. Kent Information gain and a general measure of correlation , 1983 .

[21]  Markus Hagenbuchner,et al.  Breast cancer data analysis for survivability studies and prediction , 2018, Comput. Methods Programs Biomed..

[22]  Anthony B. Miller,et al.  Why have breast cancer mortality rates declined , 2015 .