Computer-Aided Lung Nodule Recognition by SVM Classifier Based on Combination of Random Undersampling and SMOTE

In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a novel SVM classifier combined with random undersampling (RU) and SMOTE for lung nodule recognition. The combinations of the two resampling methods not only achieve a balanced training samples but also remove noise and duplicate information in the training sample and retain useful information to improve the effective data utilization, hence improving performance of SVM algorithm for pulmonary nodules classification under the unbalanced data. Eight features including 2D and 3D features are extracted for training and classification. Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.

[1]  Anthony P. Reeves,et al.  Three-dimensional segmentation and growth-rate estimation of small pulmonary nodules in helical CT images , 2003, IEEE Transactions on Medical Imaging.

[2]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[3]  Geoffrey D. Rubin,et al.  Adaptive border marching algorithm: Automatic lung segmentation on chest CT images , 2008, Comput. Medical Imaging Graph..

[4]  Michael F. McNitt-Gray,et al.  Automated classification of lung bronchovascular anatomy in CT using AdaBoost , 2007, Medical Image Anal..

[5]  E. Hoffman,et al.  Lung image database consortium: developing a resource for the medical imaging research community. , 2004, Radiology.

[6]  Kunio Doi,et al.  Computer-aided diagnostic scheme for distinction between benign and malignant nodules in thoracic low-dose CT by use of massive training artificial neural network , 2005, IEEE Transactions on Medical Imaging.

[7]  Kunio Doi,et al.  Computer-aided diagnosis in chest radiography , 2007, Comput. Medical Imaging Graph..

[8]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[9]  Anthony J. Sherbondy,et al.  Pulmonary nodules on multi-detector row CT scans: performance comparison of radiologists and computer-aided detection. , 2005, Radiology.

[10]  P. Okunieff,et al.  Lung metastases detection in CT images using 3D template matching. , 2007, Medical physics.

[11]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[12]  Max A. Viergever,et al.  Computer-aided diagnosis in chest radiography: a survey , 2001, IEEE Transactions on Medical Imaging.

[13]  M. A. H. Farquad,et al.  Preprocessing unbalanced data using support vector machine , 2012, Decis. Support Syst..

[14]  Hamid Abrishami Moghaddam,et al.  Refinement of lung nodule candidates based on local geometric shape analysis and Laplacian of Gaussian kernels , 2014, Comput. Biol. Medicine.

[15]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[16]  Tsuyoshi Shiina,et al.  Accuracy Improvement of Pulmonary Nodule Detection Based on Spatial Statistical Analysis of Thoracic CT Scans , 2007, IEICE Trans. Inf. Syst..

[17]  Anselmo Cardoso de Paiva,et al.  Automatic detection of small lung nodules in 3D CT data using Gaussian mixture models, Tsallis entropy and SVM , 2014, Eng. Appl. Artif. Intell..

[18]  Abbas Z. Kouzani,et al.  Random forest based lung nodule classification aided by clustering , 2010, Comput. Medical Imaging Graph..

[19]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[20]  GuoHongyu,et al.  Learning from imbalanced data sets with boosting and data generation , 2004 .

[21]  Jin Mo Goo,et al.  Pulmonary nodule registration in serial CT scans using global rib matching and nodule template matching , 2014, Comput. Biol. Medicine.

[22]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[23]  Hong Gu,et al.  Imbalanced classification using support vector machine ensemble , 2011, Neural Computing and Applications.

[24]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[25]  SilvaAristófanes Corrêa,et al.  Automatic detection of small lung nodules in 3D CT data using Gaussian mixture models, Tsallis entropy and SVM , 2014 .

[26]  Jamshid Dehmeshki,et al.  Automated detection of lung nodules in CT images using shape-based genetic algorithm , 2007, Comput. Medical Imaging Graph..

[27]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[32]  T. Warren Liao,et al.  Classification of weld flaws with imbalanced class data , 2008, Expert Syst. Appl..

[33]  Khalid Iqbal,et al.  Potential Lung Nodules Identification for Characterization by Variable Multistep Threshold and Shape Indices from CT Images , 2014, Comput. Math. Methods Medicine.

[34]  Daw-Tung Lin,et al.  Autonomous detection of pulmonary nodules on CT images with a neural network-based fuzzy system. , 2005, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.