Comparative Performance Analysis of State-of-the-Art Classification Algorithms Applied to Lung Tissue Categorization

In this paper, we compare five common classifier families in their ability to categorize six lung tissue patterns in high-resolution computed tomography (HRCT) images of patients affected with interstitial lung diseases (ILD) and with healthy tissue. The evaluated classifiers are naive Bayes, k-nearest neighbor, J48 decision trees, multilayer perceptron, and support vector machines (SVM). The dataset used contains 843 regions of interest (ROI) of healthy and five pathologic lung tissue patterns identified by two radiologists at the University Hospitals of Geneva. Correlation of the feature space composed of 39 texture attributes is studied. A grid search for optimal parameters is carried out for each classifier family. Two complementary metrics are used to characterize the performances of classification. These are based on McNemar’s statistical tests and global accuracy. SVM reached best values for each metric and allowed a mean correct prediction rate of 88.3% with high class-specific precision on testing sets of 423 ROIs.

[1]  Robert M. Nishikawa,et al.  Current status and future directions of computer-aided diagnosis in mammography , 2007, Comput. Medical Imaging Graph..

[2]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  G. Tourassi Journey toward computer-aided diagnosis: role of image texture analysis. , 1999, Radiology.

[5]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[6]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[7]  Arcot Sowmya,et al.  Tuning kernel function parameters of support vector machines for segmentation of lung disease patterns in high-resolution computed tomography images , 2004, SPIE Medical Imaging.

[8]  Susan Murray,et al.  Idiopathic interstitial pneumonia: what is the effect of a multidisciplinary approach to diagnosis? , 2004, American journal of respiratory and critical care medicine.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  Tatjana Zrimec,et al.  Improving Computer Aided Disease Detection Using Knowledge of Disease Appearance , 2007, MedInfo.

[12]  Tatjana Zrimec,et al.  Classification of Lung Disease Pattern Using Seeded Region Growing , 2006, Australian Conference on Artificial Intelligence.

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Richard A. Robb,et al.  Nonlinear histogram binning for quantitative analysis of lung tissue fibrosis in high-resolution CT data , 2007, SPIE Medical Imaging.

[15]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[16]  Jianhua Yao,et al.  Texture-based computer-aided diagnosis system for lung fibrosis , 2007, SPIE Medical Imaging.

[17]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[18]  A. Sowmya,et al.  The anisotropic Gaussian kernel for SVM classification of HRCT images of the lung , 2004, Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004..

[19]  Carla E. Brodley,et al.  ASSERT: A Physician-in-the-Loop Content-Based Retrieval System for HRCT Image Databases , 1999, Comput. Vis. Image Underst..

[20]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[22]  Antoine Geissbühler,et al.  Erratum to "A review of content-based image retrieval systems in medical applications - Clinical benefits and future directions" [I. J. Medical Informatics 73 (1) (2004) 1-23] , 2009, Int. J. Medical Informatics.

[23]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[24]  Thierry Blu,et al.  Isotropic polyharmonic B-splines: scaling functions and wavelets , 2005, IEEE Transactions on Image Processing.

[25]  Michael Unser,et al.  Texture classification and segmentation using wavelet frames , 1995, IEEE Trans. Image Process..

[26]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[27]  H. Muller,et al.  Lung Tissue Classification Using Wavelet Frames , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[29]  Antoine Geissbühler,et al.  Image-based diagnostic aid for interstitial lung disease with secondary data integration , 2007, SPIE Medical Imaging.

[30]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[31]  S. Worthy High resolution computed tomography of the lungs , 1995, BMJ.

[32]  E. Hoffman,et al.  Computer recognition of regional lung disease patterns. , 1999, American journal of respiratory and critical care medicine.

[33]  A. Kak,et al.  Automated storage and retrieval of thin-section CT images to assist diagnosis: system description and preliminary assessment. , 2003, Radiology.

[34]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[35]  Etienne Barnard,et al.  Data characteristics that determine classifier performance , 2006 .