Random Forest Modeling for Survival Analysis of Cancer Recurrences

The recurrence of breast cancer is a prevailing problem that decreases the quality of patients’ lives, creates high burdens on the healthcare system, and impacts the wellbeing of society. Advanced sensing provides an unprecedented opportunity to increase information visibility and characterize patterns of event occurrences. However, few, if any, of previous works have investigated survival analysis of breast cancer recurrences based on large amount of data readily available in the health system. There is a dire need to leverage data to decipher important factors that play a role in the recurrence of breast cancer. This paper presents an ensemble method of random survival forest for time-to-event analysis of breast cancer recurrences in the surveillance, epidemiology, and end results (SEER) data from year 1973 to 2015. Our model characterizes the survival function among patients with and without recurrences of breast cancer. Ensemble models are constructed via sampling and bootstrapping into the big data. Experimental results show that the age when cancer recurrence happens and time-between-recurrences approximately follow the Gaussian and exponential distributions with the means of $61.35 \pm 14.03$ and 2.61 years, respectively. In addition, the results show age, surgery status, stage of tumors, and histological grade are significant factors that influence the probability of breast cancer recurrences. The proposed survival analysis approach shows strong potentials to help healthcare practitioners in prognosis, treatment, and decision-making of breast cancer recurrences.

[1]  Dejian Lai,et al.  Hazard of Recurrence among Women after Primary Breast Cancer Treatment—A 10-Year Follow-up Using Data from SEER-Medicare , 2012, Cancer Epidemiology, Biomarkers & Prevention.

[2]  Per Karlsson,et al.  Annual Hazard Rates of Recurrence for Breast Cancer During 24 Years of Follow-Up: Results From the International Breast Cancer Study Group Trials I to V. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[3]  D.,et al.  Regression Models and Life-Tables , 2022 .

[4]  Gary H Lyman,et al.  American Cancer Society/American Society of Clinical Oncology Breast Cancer Survivorship Care Guideline. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  S. Narod,et al.  Triple-Negative Breast Cancer: Clinical Features and Patterns of Recurrence , 2007, Clinical Cancer Research.

[6]  G. V. van Breukelen,et al.  The course of fear of cancer recurrence: Different patterns by age in breast cancer survivors , 2018, Psycho-oncology.

[7]  A. Jemal,et al.  Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[8]  Nadia Howlader,et al.  Can We Use Survival Data from Cancer Registries to Learn about Disease Recurrence? The Case of Breast Cancer , 2018, Cancer Epidemiology, Biomarkers & Prevention.

[9]  Alireza Abadi,et al.  Cox Models Survival Analysis Based on Breast Cancer Treatments , 2014, Iranian journal of cancer prevention.

[10]  Hui Yang,et al.  Nested Gaussian process modeling for high-dimensional data imputation in healthcare systems , 2018 .

[11]  Hemant Ishwaran,et al.  Evaluating Random Forests for Survival Analysis using Prediction Error Curves. , 2012, Journal of statistical software.

[12]  Jianwen Cai,et al.  Modelling recurrent events: a tutorial for analysis in epidemiology. , 2015, International journal of epidemiology.

[13]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[14]  Hui Yang,et al.  Nested Gaussian process modeling and imputation of high-dimensional incomplete data under uncertainty , 2019, IISE Transactions on Healthcare Systems Engineering.

[15]  Maria C. Katapodi,et al.  The relationship between illness representations, risk perception and fear of cancer recurrence in breast cancer survivors , 2017, Psycho-oncology.

[16]  S. Hilsenbeck,et al.  Time-dependence of hazard ratios for prognostic factors in primary breast cancer , 2004, Breast Cancer Research and Treatment.

[17]  Katherine L Kahn,et al.  Validity of cancer registry data for measuring the quality of breast cancer care. , 2002, Journal of the National Cancer Institute.

[18]  R. Gray,et al.  Annual hazard rates of recurrence for breast cancer after primary therapy. , 1996, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.