Visual Object Detection Using Cascades of Binary and One-Class Classifiers

We describe an efficient approach to visual object detection that uses short cascades of asymmetric ‘one class’ classifiers to quickly reject negatives (windows not centered on an object of the desired class) within a sliding window framework. Current detectors typically use binary discriminants such as Support Vector Machines or Boosting to implement each stage of the cascade. These treat the positive and negative classes symmetrically. We argue that this is suboptimal because object detectors typically see a great many negative windows with extremely diverse contents and only a few positive ones with comparatively coherent contents. We show that asymmetric representations that focus on tightly modeling the extent of the rare, coherent positive class can lead to simpler classifiers and faster rejection. Our cascades use asymmetric classifiers based on simple convex models to progressively tighten the bound on the positive class. They typically start with a conventional linear SVM for initial pruning, followed by a cascade of linear distance-to-hyperplane and interior-of-hypersphere classifiers and finishing with a kernelized hypersphere classifier. We show that the resulting detectors have competitive performance on the Labeled Faces in the Wild dataset and state-of-the-art performance on the FDDB face detection, ESOGU face detection and INRIA Person datasets. The results on the PASCAL VOC 2007 dataset are also respectable given that they use neither object parts nor context. The one-class formulations provide significant reductions in classifier complexity relative to the corresponding two-class ones, making them suitable for real-world applications.

[1]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[3]  S. Hussain,et al.  Machine Learning Methods for Visual Object Detection , 2012 .

[4]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[5]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[6]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[7]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[8]  Jiri Matas,et al.  Weighted Sampling for Large-Scale Boosting , 2008, BMVC.

[9]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ramón López de Mántaras,et al.  Fast and robust object segmentation with the Integral Linear Classifier , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[14]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[18]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[21]  Bill Triggs,et al.  Feature Sets and Dimensionality Reduction for Visual Object Detection , 2010, BMVC.

[22]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[23]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Hakan Cevikalp,et al.  Efficient object detection using cascades of nearest convex model classifiers , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[28]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Xavier Perrotton,et al.  Implicit hierarchical boosting for multi-view object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Hakan Cevikalp,et al.  Face and landmark detection by using cascade of classifiers , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[31]  Hakan Cevikalp,et al.  Nearest hyperdisk methods for high-dimensional classification , 2008, ICML '08.

[32]  Shimon Ullman,et al.  Object Classification Using a Fragment-Based Representation , 2000, Biologically Motivated Computer Vision.

[33]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[34]  Murat Dundar,et al.  Polyhedral classifier for target detection: a case study: colorectal cancer , 2008, ICML '08.

[35]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[36]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[37]  Andrew Hogue,et al.  Histogram-based search: A comparative study , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Hong Ren Wu,et al.  Perceptual Color Image Coding With JPEG2000 , 2010, IEEE Transactions on Image Processing.

[39]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Yair Weiss,et al.  Learning object detection from a small number of examples: the importance of good features , 2004, CVPR 2004.

[41]  Yichen Wei,et al.  Efficient histogram-based sliding window , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Mineichi Kudo,et al.  Piecewise linear classifiers with an appropriate number of hyperplanes , 1998, Pattern Recognit..

[43]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[44]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[45]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Hanqing Lu,et al.  Face detection using one-class-based support vectors , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[48]  Maja Pantic,et al.  Empirical analysis of cascade deformable models for multi-view face detection , 2013, Image Vis. Comput..

[49]  Rafail N. Gasimov,et al.  Separation via polyhedral conic functions , 2006, Optim. Methods Softw..

[50]  Anelia Angelova,et al.  Real-Time Pedestrian Detection with Deep Network Cascades , 2015, BMVC.

[51]  Hakan Cevikalp,et al.  Manifold Based Local Classifiers: Linear and Nonlinear Approaches , 2010, J. Signal Process. Syst..

[52]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[54]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[55]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[56]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[58]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Olvi L. Mangasarian,et al.  Multisurface proximal support vector machine classification via generalized eigenvalues , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.