A joint appearance-spatial distance for kernel-based image categorization

The goal of image categorization is to classify a collection of unlabeled images into a set of predefined classes to support semantic-level image retrieval. The distance measures used in most existing approaches either ignored the spatial structures or used them in a separate step. As a result, these distance measures achieved only limited success. To address these difficulties, in this paper, we propose a new distance measure that integrates joint appearance-spatial image features. Such a distance measure is computed as an upper bound of an information-theoretic discrimination, and can be computed efficiently in a recursive formulation that scales well to image size. In addition, the upper bound approximation can be further tightened via adaption learning from a universal reference model. Extensive experiments on two widely-used data sets show that the proposed approach significantly outperforms the state-of-the-art approaches.

[1]  Bernard Mérialdo,et al.  A New Approach to Probabilistic Image Modeling with Multidimensional Hidden Markov Models , 2006, Adaptive Multimedia Retrieval.

[2]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Refractor Vision , 2000, The Lancet.

[4]  Robert M. Gray,et al.  Image classification by a two dimensional hidden Markov model , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Chiou-Shann Fuh,et al.  Local Ensemble Kernel Learning for Object Category Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Horace Ho-Shing Ip,et al.  Automatic Semantic Annotation of Images using Spatial Hidden Markov Model , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[8]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[9]  Minh N. Do,et al.  Rotation invariant texture characterization and retrieval using steerable wavelet-domain hidden Markov models , 2002, IEEE Trans. Multim..

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[14]  Yoram Singer,et al.  Batch and On-Line Parameter Estimation of Gaussian Mixtures Based on the Joint Entropy , 1998, NIPS.

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[17]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .