Predicting malignant nodules by fusing deep features with classical radiomics features

Abstract. Lung cancer has a high incidence and mortality rate. Early detection and diagnosis of lung cancers is best achieved with low-dose computed tomography (CT). Classical radiomics features extracted from lung CT images have been shown as able to predict cancer incidence and prognosis. With the advancement of deep learning and convolutional neural networks (CNNs), deep features can be identified to analyze lung CTs for prognosis prediction and diagnosis. Due to a limited number of available images in the medical field, the transfer learning concept can be helpful. Using subsets of participants from the National Lung Screening Trial (NLST), we utilized a transfer learning approach to differentiate lung cancer nodules versus positive controls. We experimented with three different pretrained CNNs for extracting deep features and used five different classifiers. Experiments were also conducted with deep features from different color channels of a pretrained CNN. Selected deep features were combined with radiomics features. A CNN was designed and trained. Combinations of features from pretrained, CNNs trained on NLST data, and classical radiomics were used to build classifiers. The best accuracy (76.79%) was obtained using feature combinations. An area under the receiver operating characteristic curve of 0.87 was obtained using a CNN trained on an augmented NLST data cohort.

[1]  Samuel H. Hawkins,et al.  Deep Feature Transfer Learning in Combination with Traditional Features Predicts Survival Among Patients with Lung Adenocarcinoma , 2016, Tomography.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[6]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[7]  J. Hanley Receiver operating characteristic (ROC) methodology: the state of the art. , 1989, Critical reviews in diagnostic imaging.

[8]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[9]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[10]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[11]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[12]  Andre Dekker,et al.  Radiomics: the process and the challenges. , 2012, Magnetic resonance imaging.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[15]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[16]  Samuel H. Hawkins,et al.  Predicting Malignant Nodules from Screening CT Scans , 2016, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[17]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[18]  Robert J. Gillies,et al.  Combining deep neural network and traditional image features to improve survival prediction accuracy for lung cancer patients from diagnostic CT , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[24]  K. Hajian‐Tilaki,et al.  Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. , 2013, Caspian journal of internal medicine.

[25]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[26]  Matthew B Schabath,et al.  Differences in Patient Outcomes of Prevalence, Interval, and Screen-Detected Lung Cancers in the CT Arm of the National Lung Screening Trial , 2016, PloS one.

[27]  Robert J. Gillies,et al.  Improving malignancy prediction through feature selection informed by nodule size ranges in NLST , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[28]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29]  Nikhil Ketkar,et al.  Introduction to Keras , 2017 .

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Patrick Granton,et al.  Radiomics: extracting more information from medical images using advanced feature analysis. , 2012, European journal of cancer.

[34]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[35]  C. Gatsonis,et al.  Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening , 2012 .

[36]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[37]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[38]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.