Machine Learning for Image Classification

Machine learning is a powerful tool for pattern classification. The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Machine learning is thus closely related to the fields of statistics and data mining, but differs slightly in terms of its emphasis and terminology (Murphy, 2012). Machine learning uses the theory of statistics in building mathematical models, because the core task is making inference from a sample. A model may be predictive to make predictions in the feature, or descriptive to gain knowledge from data, or both (Alpaydin, 2010). Recent year, image classification has become quite a significant topic in image engineering (Zhang, 2013a) with the requirements from action recognition (Zheng et al., 2012), emotional image retrieval (Li & Zhang, 2010), scene categorization (Liu & Zhang, 2011) and behavior understanding (Zhang, 2013b), etc. Image classification aims at associating different images with some semantic labels to represent the image contents abstractly. To achieve this goal, various machine learning and pattern recognition techniques could be used (Bishop, 2006). Among many potential techniques adopted in image classification, the technique that uses dictionary learned by sparse coding has achieved competitive performance recently. Sparse coding is capable of reducing the reconstruction error in transforming low-level descriptors into compact mid-level features. However, dictionary learned only by sparse coding does not have the ability to distinguish different classes, and it is not the optimum dictionary for the classification task. In this article, some current techniques for image classification and their existing problems are reviewed, some research directions are discussed, and some progress works, especially a novel discriminant dictionary learning method combining linear discriminant analysis with sparse coding and an integrated dictionary learning technique on manifold are introduced. Their performances are compared with other techniques and quite satisfactory results have been obtained.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[5]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[6]  Bin Shen,et al.  Learning dictionary on manifolds for image classification , 2013, Pattern Recognit..

[7]  Karthikeyan Natesan Ramamurthy,et al.  Improved sparse coding using manifold projections , 2011, 2011 18th IEEE International Conference on Image Processing.

[8]  Xue Li,et al.  Action recognition in still images using a combination of human pose and context information , 2012, 2012 19th IEEE International Conference on Image Processing.

[9]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Huachun Tan,et al.  Discovering latent semantic factors for emotional picture categorization , 2010, 2010 IEEE International Conference on Image Processing.

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Yu-Jin Zhang,et al.  Discriminant sparse coding for image classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[15]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[17]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.