A Multiple Kernel Learning Approach to Joint Multi-class Object Detection

Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, since this would allow the use of contextual information such as co-occurrence between classes. However, this approach is usually not employed because of its computational cost. In this paper we propose a method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes. By following a multiple kernel learning (MKL) approach, we automatically obtain a sparse dependency graph of relevant object classes on which to base the decision. Experiments on the PASCAL VOC 2006 and 2007 datasets show that the subsequent joint decision step clearly improves the accuracy compared to single class detection.

[1]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[2]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Luhong Liang,et al.  A detector tree of boosted classifiers for real-time object detection and tracking , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[7]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[8]  Bernhard Schölkopf,et al.  Face Detection - Efficient and Rank Deficient , 2004, NIPS.

[9]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[10]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Bernt Schiele,et al.  Integrating representative and discriminant models for object category detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[14]  Christoph Schnörr,et al.  Learning of Graphical Models and Efficient Inference for Object Class Recognition , 2006, DAGM-Symposium.

[15]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[16]  Antonio Torralba,et al.  Shared Features for Multiclass Object Detection , 2006, Toward Category-Level Object Recognition.

[17]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[18]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[20]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[21]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Joachim M. Buhmann,et al.  Learning the Compositional Nature of Visual Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Wolfgang Schulz,et al.  Pedestrian Recognition from a Moving Catadioptric Camera , 2007, DAGM-Symposium.

[24]  Hans Burkhardt,et al.  Patch Based Localization of Visual Object Class Instances , 2007, MVA.

[25]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[26]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[27]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  T. Breuel,et al.  Electronic Letters on Computer Vision and Image Analysis 6(1):44-54, 2007 Optimal Geometric Matching for Patch-Based Object Detection , 2006 .

[29]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.