Pedestrian detection: A benchmark

Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. Much of the progress of the past few years has been driven by the availability of challenging public datasets. To continue the rapid rate of innovation, we introduce the Caltech Pedestrian Dataset, which is two orders of magnitude larger than existing datasets. The dataset contains richly annotated video, recorded from a moving vehicle, with challenging images of low resolution and frequently occluded people. We propose improved evaluation metrics, demonstrating that commonly used per-window measures are flawed and can fail to predict performance on full images. We also benchmark several promising detection systems, providing an overview of state-of-the-art performance and a direct, unbiased comparison of existing methods. Finally, by analyzing common failure cases, we help identify future research directions for the field.

[1]  Yoshiaki Shirai,et al.  Detection of the movements of persons from a sparse sequence of TV images , 1983, Pattern Recognition.

[2]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[5]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[6]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[7]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  A. Shashua,et al.  Pedestrian detection for driving assistance systems: single-frame classification and system level performance , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[9]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[10]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[11]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[12]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[13]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Navneet Dalal,et al.  Finding People in Images and Videos , 2006 .

[16]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Dariu Gavrila,et al.  Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.

[19]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[20]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[21]  David A. Forsyth,et al.  Configuration Estimates Improve Pedestrian Finding , 2007, NIPS.

[22]  Ramakant Nevatia,et al.  Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Bernt Schiele,et al.  Towards Robust Pedestrian Detection in Crowded Image Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  James W. Davis,et al.  Integrating Appearance and Motion Cues for Simultaneous Detection and Segmentation of Pedestrians , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Angel D. Sappa,et al.  Adaptive Image Sampling and Windows Classification for On-board Pedestrian Detection , 2007 .

[26]  Jan-Michael Frahm,et al.  Differential Camera Tracking through Linearizing the Local Appearance Manifold , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Zhuowen Tu,et al.  Feature Mining for Image Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Greg Mori,et al.  Detecting Pedestrians by Learning Shapelet Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Bernt Schiele,et al.  A Performance Evaluation of Single and Multi-feature People Detection , 2008, DAGM-Symposium.

[32]  Larry S. Davis,et al.  A Pose-Invariant Descriptor for Human Detection and Segmentation , 2008, ECCV.

[33]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Pietro Perona,et al.  Multiple Component Learning for Object Detection , 2008, ECCV.

[35]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Dariu Gavrila,et al.  Pedestrian Detection and Tracking Using a Mixture of View-Based Shape–Texture Models , 2008, IEEE Transactions on Intelligent Transportation Systems.

[38]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.