A Comparative Study of Data Mining approaches for Bag of Visual Words Based Image Classification

Image classification is one of the most significant and challenging tasks in computer vision. The goal of this task is to build a system that is capable to reveal an image label within a collection of different image categories. This paper presents and discusses the application of various data mining techniques for image classification based on Bag of Visual Words (BoVW) feature extraction algorithm. The BoVW model is constructed using grey level features: The Speeded Up Robust Features (SURF) and Maximally Stable Extremal Regions (MSER) descriptors along with color features: Color correlograms and Improved Color Coherence Vector (ICCV). Five data mining techniques; Neural Networks (NN), Decision Trees (DT), Bayesian Network (BN), Discriminant Analysis (DA) and K Nearest Neighbor (KNN), are explored and evaluated on two large different datasets: Corel-1000 and COIL-100. The experimental results illustrate that BN and DA outperform the other data mining methods considered in this comparative study. For Corel-1000 dataset, BN and DA achieved an average accuracy and specificity of about 99.9% and an average sensitivity of about 99.5 and 99.4%, respectively. While for the COIL-100 dataset, BN and DA accomplished an average accuracy and sensitivity of about 100% and an average specificity of about 98.5 and 98.9, respectively.

[1]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[2]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[3]  George C. J. Fernandez,et al.  Discriminant Analysis , A Powerful Classification Technique in Data Mining , 2002 .

[4]  Dong Liu,et al.  Spatial encoding of visual words for image classification , 2016, J. Electronic Imaging.

[5]  Tshilidzi Marwala,et al.  Hyperspectral image classification using random forests and neural networks , 2012 .

[6]  Pooja Kamavisdar,et al.  A Survey on Image Classification Approaches and Techniques , 2013 .

[7]  Lida Hosseini,et al.  Hyperspectral Image Classification Based on Hierarchical SVM Algorithm for Improving Overall Accuracy , 2017 .

[8]  Matti Pietikäinen,et al.  Spontaneous facial micro-expression analysis using Spatiotemporal Completed Local Quantized Patterns , 2016, Neurocomputing.

[9]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[12]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[13]  Matti Pietikäinen,et al.  Texture Description with Completed Local Quantized Patterns , 2013, SCIA.

[14]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Matti Pietikäinen,et al.  Spatiotemporal Local Monogenic Binary Patterns for Facial Expression Recognition , 2012, IEEE Signal Processing Letters.

[17]  Minxia Luo,et al.  A hybrid approach combining extreme learning machine and sparse representation for image classification , 2014, Eng. Appl. Artif. Intell..

[18]  Mahmoud Fakhr,et al.  Steel Plates Faults Diagnosis with Data Mining Models , 2012 .

[19]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  José Eladio Medina-Pagola,et al.  Image Classification Using Frequent Approximate Subgraphs , 2012, CIARP.

[22]  Goreti Marreiros,et al.  Applying Data Mining Techniques to Improve Breast Cancer Diagnosis , 2016, Journal of Medical Systems.

[23]  Scott Krig,et al.  Interest Point Detector and Feature Descriptor Survey , 2014 .

[24]  Jia Zhang,et al.  Comparative analysis of image classification methods for automatic diagnosis of ophthalmic images , 2017, Scientific Reports.

[25]  Peter C Austin,et al.  Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. , 2013, Journal of clinical epidemiology.

[26]  Clayton Scott Bayes Classifier , 2009, Encyclopedia of Database Systems.

[27]  Christophe Charrier,et al.  Accurate Image Search using Local Descriptors into a Compact Image Representation , 2013 .

[28]  Kesari Verma,et al.  Fuzzy cluster based neural network classifier for classifying breast tumors in ultrasound images , 2016, Expert Syst. Appl..

[29]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[30]  Tao Li,et al.  Using discriminant analysis for multi-class classification: an experimental investigation , 2006, Knowledge and Information Systems.

[31]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[32]  Juan Yang,et al.  A novel method for image classification based on bag of visual words , 2016, J. Vis. Commun. Image Represent..

[33]  M. Lo,et al.  Discriminant analysis : An illustrated example , 2010 .

[34]  Maneela Jain,et al.  Review of Image Classification Methods and Techniques , 2013 .

[35]  Luis G. Vargas,et al.  Choosing Data-Mining Methods for Multiple Classification: Representational and Performance Measurement Implications for Decision Support , 1999, J. Manag. Inf. Syst..

[36]  Heba A. Elnemr,et al.  Combining SURF and MSER along with Color Features for Image Retrieval System Based on Bag of Visual Words , 2016, J. Comput. Sci..