Predictive models for colorectal cancer recurrence using multi-modal healthcare data

Colorectal cancer recurrence is a major clinical problem - around 30-40% of patients who are treated with curative intent surgery will experience cancer relapse. Proactive prognostication is critical for early detection and treatment of recurrence. However, the common clinical approach to monitoring recurrence through testing for carcinoembryonic antigen (CEA) does not possess a strong prognostic performance. In our paper, we study a series of machine and deep learning architectures that exploit heterogeneous healthcare data to predict colorectal cancer recurrence. In particular, we demonstrate three different approaches to extract and integrate features from multiple modalities including longitudinal as well as tabular clinical data. Our best model employs a hybrid architecture that takes in multi-modal inputs and comprises: 1) a Transformer model carefully modified to extract high-quality features from time-series data, and 2) a Multi-Layered Perceptron (MLP) that learns tabular data features, followed by feature integration and classification for prediction of recurrence. It achieves an AUROC score of 0.95, as well as precision, sensitivity and specificity scores of 0.83, 0.80 and 0.96 respectively, surpassing the performance of all-known published results based on CEA, as well as most commercially available diagnostic assays. Our results could lead to better post-operative management and follow-up of colorectal cancer patients.

[1]  P. Catalano,et al.  Postsurgical surveillance of colon cancer: preliminary cost analysis of physician examination, carcinoembryonic antigen testing, chest x-ray, and colonoscopy. , 1998, Annals of surgery.

[2]  Xiao Tan,et al.  Applying Machine Learning for Integration of Multi-Modal Genomics Data and Imaging Data to Quantify Heterogeneity in Tumour Tissues , 2021, Artificial Neural Networks, 3rd Edition.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Xiaofei Wang,et al.  A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[5]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[6]  H. Pommergaard,et al.  The diagnostic accuracy of carcinoembryonic antigen to detect colorectal cancer recurrence - A systematic review. , 2016, International journal of surgery.

[7]  Girish N. Nadkarni,et al.  Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration , 2014, J. Biomed. Informatics.

[8]  R. Perera,et al.  Effect of 3 to 5 years of scheduled CEA and CT follow-up to detect recurrence of colorectal cancer: the FACS randomized clinical trial. , 2013, JAMA.

[9]  Anna Rumshisky,et al.  Unfolding physiological state: mortality modelling in intensive care units , 2014, KDD.

[10]  Parashkev Nachev,et al.  Computer Methods and Programs in Biomedicine NiftyNet: a deep-learning platform for medical imaging , 2022 .

[11]  Mauro Cettolo,et al.  A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation , 2018, COLING.

[12]  Akane Sano,et al.  Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[13]  Kyung-Ah Sohn,et al.  MildInt: Deep Learning-Based Multimodal Longitudinal Data Integration Framework , 2019, Front. Genet..

[14]  Sébastien Ourselin,et al.  On the Compactness, Efficiency, and Representation of 3D Convolutional Networks: Brain Parcellation as a Pretext Task , 2017, IPMI.

[15]  Marco Novelli,et al.  Deep learning for prediction of colorectal cancer outcome: a discovery and validation study , 2020, The Lancet.

[16]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[18]  Andreas W. Kempa-Liehr,et al.  Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package) , 2018, Neurocomputing.

[19]  Cheng Wang,et al.  LRMM: Learning to Recommend with Missing Modalities , 2018, EMNLP.

[20]  Bing Zhang,et al.  Predicting colorectal cancer recurrence by utilizing multiple-view multiple-learner supervised learning. , 2017 .

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  J. Wind,et al.  Symptomatic and Asymptomatic Colon Cancer Recurrence: A Multicenter Cohort Study , 2016, The Annals of Family Medicine.

[23]  Jiayu Zhou,et al.  Missing Modalities Imputation via Cascaded Residual Autoencoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  R. Volk,et al.  Predicting Risk of Recurrence After Colorectal Cancer Surgery in the United States: An Analysis of a Special Commission on Cancer National Study , 2020, Annals of Surgical Oncology.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  W. Xu,et al.  The role of tissue and serum carcinoembryonic antigen in stages I to III of colorectal cancer—A retrospective cohort study , 2018, Cancer medicine.

[27]  Kyung Sup Kwak,et al.  Multimodal multitask deep learning model for Alzheimer's disease progression detection based on time series data , 2020, Neurocomputing.

[28]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[29]  A. Lacy,et al.  Value of postoperative surveillance after radical surgery for colorectal cancer , 1998, Diseases of the colon and rectum.

[30]  E. Giovannucci,et al.  Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies , 2019, Nature Reviews Gastroenterology & Hepatology.

[31]  Jianjun Yang,et al.  Machine Learning Algorithms for Predicting the Recurrence of Stage IV Colorectal Cancer After Tumor Resection , 2020, Scientific Reports.

[32]  V. Moreno,et al.  Genomic classifier ColoPrint predicts recurrence in stage II colorectal cancer patients more accurately than clinical factors. , 2015, The oncologist.

[33]  P. Parfrey,et al.  The long-term survival characteristics of a cohort of colorectal cancer patients and baseline variables associated with survival outcomes with or without time-varying effects , 2019, BMC Medicine.

[34]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[35]  Seong-Wook Lee,et al.  The Roles of Carcinoembryonic Antigen in Liver Metastasis and Therapeutic Approaches , 2017, Gastroenterology research and practice.

[36]  Joshua C Denny,et al.  Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction , 2018, bioRxiv.

[37]  Brian D Nicholson,et al.  The diagnostic accuracy of a single CEA blood test in detecting colorectal cancer recurrence: Results from the FACS trial , 2017, PloS one.

[38]  Ziqian Wu,et al.  A machine learning-based prognostic predictor for stage III colon cancer , 2020, Scientific Reports.

[39]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[40]  D. W. Fry,et al.  Generation and external validation of a tumor-derived 5-gene prognostic signature for recurrence of lymph node-negative, invasive colorectal carcinoma , 2012, Cancer.

[41]  Daniel F Hayes,et al.  ASCO 2006 update of recommendations for the use of tumor markers in gastrointestinal cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[42]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.