Efficient Object Localization with Variation-Normalized Gaussianized Vectors

Effective object localization relies on efficient and effective searching method, and robust image representation and learning method. Recently, the Gaussianized vector representation has been shown effective in several computer vision applications, such as facial age estimation, image scene categorization and video event recognition. However, all these tasks are classification and regression problems based on the whole images. It is not yet explored how this representation can be efficiently applied in the object localization, which reveals the locations and sizes of the objects. In this work, we present an efficient object localization approach for the Gaussianized vector representation, following a branch-and-bound search scheme introduced by Lampert et al. [5]. In particular, we design a quality bound for rectangle sets characterized by the Gaussianized vector representation for fast hierarchical search. This bound can be obtained for any rectangle set in the image, with little extra computational cost, in addition to calculating the Gaussianized vector representation for the whole image. Further, we propose incorporating a normalization approach that suppresses the variation within the object class and the background class. Experiments on a multi-scale car dataset show that the proposed object localization approach based on the Gaussianized vector representation outperforms previous work using the histogram-of-keywords representation. The within-class variation normalization approach further boosts the performance. This chapter is an extended version of our paper at the 1st International Workshop on Interactive Multimedia for Consumer Electronics at ACM Multimedia 2009 [16].

[1]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Xiaodan Zhuang,et al.  Efficient object localization with gaussianized vector representation , 2009, IMCE '09.

[6]  Thomas S. Huang,et al.  Face age estimation using patch-based hidden Markov model supervectors , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Haim H. Permuter,et al.  Gaussian mixture models of texture and colour for image database retrieval , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[12]  Thomas S. Huang,et al.  A novel Gaussianized vector representation for natural scene categorization , 2008, 2008 19th International Conference on Pattern Recognition.

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  Ming Liu,et al.  Regression from patch-kernel , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Andreas Stolcke,et al.  Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.