Predicting Invasive Disease-Free Survival for Early Stage Breast Cancer Patients Using Follow-Up Clinical Data

Objective: Chinese women are seriously threatened by breast cancer with high morbidity and mortality. The lack of robust prognosis models results in difficulty for doctors to prepare an appropriate treatment plan that may prolong patient survival time. An alternative prognosis model framework to predict invasive disease-free survival (iDFS) for early stage breast cancer patients, called MP4Ei, is proposed. MP4Ei framework gives an excellent performance to predict the relapse or metastasis breast cancer of Chinese patients in five years. Methods: MP4Ei is built based on statistical theory and gradient boosting decision tree framework. 5246 patients, derived from the clinical research center for breast in West China Hospital of Sichuan University, with early-stage (stage I–III) breast cancer are eligible for inclusion. Stratified feature selection, including statistical and ensemble methods, is adopted to select 23 out of the 89 patient features about the patient’ demographics, diagnosis, pathology, and therapy. Then, 23 selected features as the input variables are imported into the XGBoost algorithm, with Bayesian parameter tuning and cross validation, to find out the optimum simplified model for five-year iDFS prediction. Results: For eligible data, with 4196 patients (80%) for training, and with 1050 patients (20%) for testing, MP4Ei achieves comparable accuracy with AUC 0.8451, which has a significant advantage (p < 0.05). Conclusion: This work demonstrates the complete iDFS prognosis model with very competitive performance. Significance: The proposed method in this paper could be used in clinical practice to predict patients’ prognosis and future surviving state, which may help doctors make treatment plan.

[1]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[2]  Jinsung Yoon,et al.  Discovery and Clinical Decision Support for Personalized Healthcare , 2017, IEEE Journal of Biomedical and Health Informatics.

[3]  C. Caldas,et al.  A population-based validation of the prognostic model PREDICT for early breast cancer. , 2011, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[4]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[5]  R. Gray,et al.  Annual hazard rates of recurrence for breast cancer after primary therapy. , 1996, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  H. Verkooijen,et al.  The Predictive Accuracy of PREDICT: A Personalized Decision-Making Tool for Southeast Asian Women With Breast Cancer , 2015, Medicine.

[7]  A. Šimundić Measures of Diagnostic Accuracy: Basic Definitions , 2009, EJIFCC.

[8]  J. Hilton,et al.  The appropriateness of the Wilcoxon test in ordinal data. , 1996, Statistics in medicine.

[9]  Michael D. Feldman,et al.  Pharmacokinetic Tumor Heterogeneity as a Prognostic Biomarker for Classifying Breast Cancer Recurrence Risk , 2015, IEEE Transactions on Biomedical Engineering.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Weiguo Gao,et al.  Diagnosis and treatment pattern among rural and urban breast cancer patients in Southwest China from 2005 to 2009 , 2016, Oncotarget.

[12]  Hossein Rabbani,et al.  Stable Gene Signature Selection for Prediction of Breast Cancer Recurrence Using Joint Mutual Information , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  M. J. van de Vijver,et al.  External Validation of Adjuvant! Online Breast Cancer Prognosis Tool. Prioritising Recommendations for Improvement , 2011, PloS one.

[14]  L. Santoro,et al.  Reproductive behaviors and risk of developing breast cancer according to tumor subtype: A systematic review and meta-analysis of epidemiological studies. , 2016, Cancer treatment reviews.

[15]  L. Vatten,et al.  The association of reproductive factors and breastfeeding with long term survival from breast cancer , 2011, Breast Cancer Research and Treatment.

[16]  Michael R. Lyu,et al.  Maximizing sensitivity in medical diagnosis using biased minimax probability Machine , 2006, IEEE Transactions on Biomedical Engineering.

[17]  Andras Lasso,et al.  Navigated Breast Tumor Excision Using Electromagnetically Tracked Ultrasound and Surgical Instruments , 2016, IEEE Transactions on Biomedical Engineering.

[18]  Brian O'Sullivan,et al.  Head and neck cancers—major changes in the American Joint Committee on cancer eighth edition cancer staging manual , 2017, CA: a cancer journal for clinicians.

[19]  Gyan Bhanot,et al.  Expectation–Maximization-Driven Geodesic Active Contour With Overlap Resolution (EMaGACOR): Application to Lymphocyte Segmentation on Breast Cancer Histopathology , 2010, IEEE Transactions on Biomedical Engineering.

[20]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[21]  Hayit Greenspan,et al.  Multi-View Probabilistic Classification of Breast Microcalcifications , 2016, IEEE Transactions on Medical Imaging.

[22]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[23]  Abdelghani Bellaachia,et al.  Breast Cancer Survivability Prediction via Classifier Ensemble , 2016 .

[24]  Edwin Diday,et al.  The criterion of Kolmogorov-Smirnov for binary decision tree: Application to interval valued variables , 2006, Intell. Data Anal..

[25]  Kun-Huang Chen,et al.  A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients , 2014, Appl. Soft Comput..

[26]  Animesh Hazra,et al.  Predicting Lung Cancer Survivability using SVM and Logistic Regression Algorithms , 2017 .

[27]  Jungsun Lee,et al.  Effects of interval between age at first pregnancy and age at diagnosis on breast cancer survival according to menopausal status: a register-based study in Korea , 2014, BMC Women's Health.

[28]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[29]  A. Harris,et al.  An investigation into the performance of the Adjuvant! Online prognostic programme in early breast cancer for a cohort of patients in the United Kingdom , 2009, British Journal of Cancer.

[30]  P. Ravdin,et al.  Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[31]  H. Verkooijen,et al.  Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. , 2012, European journal of cancer.

[32]  Jun Fan,et al.  Breast Cancer Risk Prediction Using Electronic Health Records , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[35]  M. Mansourian,et al.  A Hybrid Computer-aided-diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning , 2016, Computational and structural biotechnology journal.

[36]  Funda Meric-Bernstam,et al.  High risk of recurrence for patients with breast cancer who have human epidermal growth factor receptor 2-positive, node-negative tumors 1 cm or smaller. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[37]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[38]  Ujjwal Maulik,et al.  Gene-Expression-Based Cancer Subtypes Prediction Through Feature Selection and Transductive SVM , 2013, IEEE Transactions on Biomedical Engineering.

[39]  Jaydev P. Desai,et al.  Toward a Portable Cancer Diagnostic Tool Using a Disposable MEMS-Based Biochip , 2016, IEEE Transactions on Biomedical Engineering.

[40]  Kjell Johnson,et al.  An Introduction to Feature Selection , 2013 .

[41]  Carlos Caldas,et al.  PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer , 2010, Breast Cancer Research.

[42]  Mihaela van der Schaar,et al.  ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening , 2016, IEEE Transactions on Multimedia.

[43]  Hyunjung Shin,et al.  Research and applications: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data , 2013, J. Am. Medical Informatics Assoc..

[44]  R. Lin,et al.  Radiation Therapy in Early-Stage Invasive Breast Cancer , 2011, Indian journal of surgical oncology.

[45]  Jianzhong Wu,et al.  Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images , 2016, IEEE Transactions on Medical Imaging.

[46]  Mark Coates,et al.  An Early Clinical Study of Time-Domain Microwave Radar for Breast Health Monitoring , 2016, IEEE Transactions on Biomedical Engineering.

[47]  Lin Zhang,et al.  Cancer Progression Prediction Using Gene Interaction Regularized Elastic Net. , 2017, IEEE/ACM transactions on computational biology and bioinformatics.

[48]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[49]  Ian O. Ellis,et al.  An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation , 2017, Breast Cancer Research.

[50]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..