On Forecasting Lung Cancer Patients’ Survival Rates Using 3D Feature Engineering

This thesis focuses on the application of Machine Learning in the healthcare domain, by extracting meaningful information from medical image reports using image processing techniques. Rather than considering a quintessential classification problem, however, we consider a regression problem whereby we aim to predict the survival times of patients after they have been diagnosed with adenocarcinoma, a type of lung cancer. We attempt to achieve this by processing chest CT scans to isolate the cancerous nodules, and extracting relevant features from the images of the nodules. Our work proposes the engineering of a feature set that can qualify the tumours with traits that are not visible to the naked eye using textural and statistical measures. We first consider the data in its 2D form, i.e., pixel data directly obtained from the images themselves, and create a benchmark using 2D Haralick computations. This feature set is then appended with statistical shape measurements as well. Furthermore, we offer two additional schemes in feature set generation, the first acting as a causal to the second. Taking the 2D feature set, we analyze the feature measurements in relation to the tumour depth as the tumour progresses through the slices of the scan. Cumulating this knowledge into a single measurement not only significantly improves the regression results, but also demonstrates the advantage of focusing on the prediction of short-term survival rate timelines as opposed to longterm survival rate timelines. The second scheme, founded by the results obtained by the first, concentrates on considering the cancer nodule in its 3D entirety, rather than just the images. This results in a considerably large feature space, with over 100 dimensions. To process these, we explore dimensionality reduction techniques, particularly data diagonalization in a block-diagonal matrix manner, to further enhance regression results.

[1]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Kunio Doi,et al.  Computer-aided diagnosis in medical imaging: Historical review, current status and future potential , 2007, Comput. Medical Imaging Graph..

[3]  Hao Chen,et al.  Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge , 2016, Medical Image Anal..

[4]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[5]  Joon Beom Seo,et al.  Development of an Automatic Classification System for Differentiation of Obstructive Lung Disease using HRCT , 2009, Journal of Digital Imaging.

[6]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[7]  Linda G. Shapiro,et al.  A SIFT descriptor with global context , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  J. Garibaldi,et al.  A new accuracy measure based on bounded relative error for time series forecasting , 2017, PloS one.

[9]  Luis Pedro Coelho,et al.  Mahotas: Open source software for scriptable computer vision , 2012, ArXiv.

[10]  Robert J. Gillies,et al.  Quantitative Computed Tomographic Descriptors Associate Tumor Shape Complexity and Intratumor Heterogeneity with Prognosis in Lung Adenocarcinoma , 2015, PloS one.

[11]  Fátima N. S. de Medeiros,et al.  Lung disease detection using feature extraction and extreme learning machine , 2014 .

[12]  Ying Wang,et al.  High-dimensional Pattern Regression Using Machine Learning: from Medical Images to Continuous Clinical Variables However, Support Vector Regression Has Some Disadvantages That Become Especially , 2022 .

[13]  Önder Demir,et al.  Computer-aided detection of lung nodules using outer surface features. , 2015, Bio-medical materials and engineering.

[14]  S. Armato,et al.  Automated detection of lung nodules in CT scans: preliminary results. , 2001, Medical physics.

[15]  Syed Omer Gilani,et al.  An appraisal of nodules detection techniques for lung cancer in CT images , 2018, Biomed. Signal Process. Control..

[16]  D. Hansell,et al.  Obstructive lung diseases: texture classification for differentiation at CT. , 2003, Radiology.

[17]  Bram van Ginneken,et al.  A large-scale evaluation of automatic pulmonary nodule detection in chest CT using local image features and k-nearest-neighbour classification , 2009, Medical Image Anal..

[18]  Syed Irtiza Ali Shah,et al.  A novel approach to CAD system for the detection of lung nodules in CT images , 2016, Comput. Methods Programs Biomed..

[19]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[20]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Zhenyu Liu,et al.  Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation , 2017, Medical Image Anal..

[22]  Vianey Guadalupe Cruz Sanchez,et al.  Automated system for lung nodules classification based on wavelet feature descriptor and support vector machine , 2015, BioMedical Engineering OnLine.

[23]  Ernest L. Hall,et al.  A Survey of Preprocessing and Feature Extraction Techniques for Radiographic Images , 1971, IEEE Transactions on Computers.

[24]  G. Comi,et al.  Semi‐automated thresholding technique for measuring lesion volumes in multiple sclerosis: effects of the change of the threshold on the computed lesion loads , 1996, Acta neurologica Scandinavica.

[25]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[26]  Fatin Zaklouta,et al.  Traffic sign classification using K-d trees and Random Forests , 2011, The 2011 International Joint Conference on Neural Networks.

[27]  Colin Studholme,et al.  A non-local fuzzy segmentation method: Application to brain MRI , 2009, Pattern Recognit..

[28]  W. Webb,et al.  Fundamentals of high-resolution lung CT : , 2015 .

[29]  L. Schwartz,et al.  Automatic detection of small lung nodules on CT utilizing a local density maximum algorithm , 2003, Journal of applied clinical medical physics.

[30]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[31]  P. Brennan,et al.  A review of lung cancer screening and the role of computer-aided detection. , 2017, Clinical radiology.

[32]  S. Armato,et al.  Automated lung segmentation for thoracic CT impact on computer-aided diagnosis. , 2004, Academic radiology.

[33]  Simon Ameer-Beg,et al.  Biomedical Imaging: From Nano to Macro , 2008 .

[34]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[35]  Temesguen Messay,et al.  Segmentation of pulmonary nodules in computed tomography using a regression neural network approach and its application to the Lung Image Database Consortium and Image Database Resource Initiative dataset , 2015, Medical Image Anal..

[36]  Gaurav Kumar,et al.  A Detailed Review of Feature Extraction in Image Processing Systems , 2014, 2014 Fourth International Conference on Advanced Computing & Communication Technologies.

[37]  Shinichi Tamura,et al.  Automated lung segmentation and smoothing techniques for inclusion of juxtapleural nodules and pulmonary vessels on chest CT images , 2014, Biomed. Signal Process. Control..

[38]  R Umamaheswari,et al.  Lung nodule volume growth analysis and visualization through auto-cluster k-means segmentation and centroid/shape variance based false nodule elimination , 2017 .

[39]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..