An efficient system for combining complementary kernels in complex visual categorization tasks

Recently, increasing interest has been brought to improve image categorization performances by combining multiple descriptors. However, very few approaches have been proposed for combining features based on complementary aspects, and evaluating the performances in realistic databases. In this paper, we tackle the problem of combining different feature types (edge and color), and evaluate the performance gain in the very challenging VOC 2009 benchmark. Our contribution is three-fold. First, we propose new local color descriptors, unifying edge and color feature extraction into the “Bag Of Word” model. Second, we improve the Spatial Pyramid Matching (SPM) scheme for better incorporating spatial information into the similarity measurement. Last but not least, we propose a new combination strategy based on ℓ1 Multiple Kernel Learning (MKL) that simultaneously learns individual kernel parameters and the kernel combination. Experiments prove the relevance of the proposed approach, which outperforms baseline combination methods while being computationally effective.

[1]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[5]  Josef Kittler,et al.  A Comparison of L_1 Norm and L_2 Norm Multiple Kernel SVMs in Image and Video Classification , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[6]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[8]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.