Seeking the Strongest Rigid Detector

The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called "components"), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. Abstract We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETH and Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.

[1]  Carlo Gatta,et al.  A new algorithm for unsupervised global and local color correction , 2003, Pattern Recognit. Lett..

[2]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  B. Schiele,et al.  Multi-cue onboard pedestrian detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[8]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Charless C. Fowlkes,et al.  Multiresolution Models for Object Detection , 2010, ECCV.

[10]  Bohyung Han,et al.  Improving object localization using macrofeature layout selection , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[11]  Tony Jebara,et al.  Variance Penalizing AdaBoost , 2011, NIPS.

[12]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[14]  Pascal Fua,et al.  A Real-Time Deformable Detector , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Piotr Dollár,et al.  Crosstalk Cascades for Frame-Rate Pedestrian Detection , 2012, ECCV.