Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods

OBJECTIVE In this work, we have developed a learning system capable of exploiting information conveyed by longitudinal Electronic Health Records (EHRs) for the prediction of a common postoperative complication, Anastomosis Leakage (AL), in a data-driven way and by fusing temporal population data from different and heterogeneous sources in the EHRs. MATERIAL AND METHODS We used linear and non-linear kernel methods individually for each data source, and leveraging the powerful multiple kernels for their effective combination. To validate the system, we used data from the EHR of the gastrointestinal department at a university hospital. RESULTS We first investigated the early prediction performance from each data source separately, by computing Area Under the Curve values for processed free text (0.83), blood tests (0.74), and vital signs (0.65), respectively. When exploiting the heterogeneous data sources combined using the composite kernel framework, the prediction capabilities increased considerably (0.92). Finally, posterior probabilities were evaluated for risk assessment of patients as an aid for clinicians to raise alertness at an early stage, in order to act promptly for avoiding AL complications. DISCUSSION Machine-learning statistical model from EHR data can be useful to predict surgical complications. The combination of EHR extracted free text, blood samples values, and patient vital signs, improves the model performance. These results can be used as a framework for preoperative clinical decision support.

[1]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[2]  José Luis Rojo-Álvarez,et al.  Bootstrap resampling feature selection and Support Vector Machine for early detection of Anastomosis Leakage , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[5]  E. Gehan A GENERALIZED WILCOXON TEST FOR COMPARING ARBITRARILY SINGLY-CENSORED SAMPLES. , 1965, Biometrika.

[6]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[7]  José Luis Rojo-Álvarez,et al.  Support Vector Machines for Nonlinear Kernel ARMA System Identification , 2006, IEEE Transactions on Neural Networks.

[8]  Lorenzo Bruzzone,et al.  Kernel methods for remote sensing data analysis , 2009 .

[9]  D G Jayne,et al.  Systematic review of methods to predict and detect anastomotic leakage in colorectal surgery , 2014, Colorectal disease : the official journal of the Association of Coloproctology of Great Britain and Ireland.

[10]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[11]  T. Lane,et al.  A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction , 2009, TCBB.

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[14]  Henrik Kehlet,et al.  Fast-track colorectal surgery , 2008, The Lancet.

[15]  José Luis,et al.  "Support Vector Feature Selection for Early Detection of Anastomosis Leakage from Bag-of-Words in Electronic Health Records" , 2014 .

[16]  N. Hyman,et al.  Abnormal vital signs are common after bowel resection and do not predict anastomotic leak. , 2014, Journal of the American College of Surgeons.

[17]  Zhouqiao Wu,et al.  Do normal clinical signs and laboratory tests exclude anastomotic leakage? , 2014, Journal of the American College of Surgeons.

[18]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[19]  Vince D. Calhoun,et al.  Characterization of groups using composite kernels and multi-source fMRI analysis data: Application to schizophrenia , 2011, NeuroImage.

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[22]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[23]  C. V. D. van de Velde,et al.  Improved diagnosis and treatment of anastomotic leakage after colorectal surgery. , 2009, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[24]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[25]  Jaehyun Park,et al.  Combined Kernel Function Approach in SVM for Diagnosis of Cancer , 2005, ICNC.

[26]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[27]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[29]  Andrea Ganna,et al.  Prediction impact curve is a new measure integrating intervention effects in the evaluation of risk models. , 2016, Journal of clinical epidemiology.

[30]  K. Havenga,et al.  Anastomotic leakage as an outcome measure for quality of colorectal cancer surgery , 2013, BMJ quality & safety.

[31]  Jing Li,et al.  Heterogeneous data fusion for alzheimer's disease study , 2008, KDD.

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[34]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[35]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[36]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[37]  Ragnhild Sørum,et al.  Recommended reference: Cancer Registry of Norway. Cancer in Norway 2007 - Cancer incidence, mortality, survival and prevalence in Norway, Oslo: Cancer Registry of Norway, 2008. , 2007 .

[38]  Shahram Ebadollahi,et al.  Data-driven approach for assessing utility of medical tests using electronic medical records , 2015, J. Biomed. Informatics.

[39]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[40]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[41]  Yan Liu,et al.  Granger Causality Analysis in Irregular Time Series , 2012, SDM.

[42]  Tien Yin Wong,et al.  Automatic Diagnosis of Pathological Myopia from Heterogeneous Biomedical Data , 2013, PloS one.

[43]  Gustavo Camps-Valls,et al.  Composite kernels for hyperspectral image classification , 2006, IEEE Geoscience and Remote Sensing Letters.

[44]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[45]  N. Harlaar,et al.  Surgeons lack predictive accuracy for anastomotic leakage in gastrointestinal surgery , 2009, International Journal of Colorectal Disease.

[46]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.