Fast human detection in crowded scenes by contour integration and local shape estimation

The complexity of human detection increases significantly with a growing density of humans populating a scene. This paper presents a Bayesian detection framework using shape and motion cues to obtain a maximum a posteriori (MAP) solution for human configurations consisting of many, possibly occluded pedestrians viewed by a stationary camera. The paper contains two novel contributions for the human detection task: 1. computationally efficient detection based on shape templates using contour integration by means of integral images which are built by oriented string scans; (2) a non-parametric approach using an approximated version of the shape context descriptor which generates informative object parts and infers the presence of humans despite occlusions. The outputs of the two detectors are used to generate a spatial configuration of hypothesized human body locations. The configuration is iteratively optimized while taking into account the depth ordering and occlusion status of the hypotheses. The method achieves fast computation times even in complex scenarios with a high density of people. Its validity is demonstrated on a substantial amount of image data using the CAVIAR and our own datasets. Evaluation results and comparison with state of the art are presented.

[1]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[2]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[3]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yann LeCun,et al.  Boxlets: A Fast Convolution Algorithm for Signal Processing and Neural Networks , 1998, NIPS.

[5]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[7]  Serge J. Belongie,et al.  Matching with shape contexts , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[8]  Pedro F. Felzenszwalb Learning models for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  James Orwell,et al.  Learning Surveillance Tracking Models for the Self-Calibrated Ground Plane , 2002, BMVC.

[11]  Ramakant Nevatia,et al.  Bayesian human segmentation in crowded situations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Simone Manganelli,et al.  CAViaR , 2004 .

[13]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Larry S. Davis,et al.  Closely coupled object detection and segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[17]  Ivan Laptev,et al.  Improvements of Object Detection Using Boosted Histograms , 2006, BMVC.

[18]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[19]  Mubarak Shah,et al.  Detecting and segmenting humans in crowded scenes , 2007, ACM Multimedia.

[20]  Wei Huang,et al.  Detection and tracking of multiple moving objects in video , 2007, VISAPP.

[21]  Larry S. Davis,et al.  Bilattice-based Logical Reasoning for Human Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Xiaogang Wang,et al.  Shape and Appearance Context Modeling , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Larry S. Davis,et al.  Hierarchical Part-Template Matching for Human Detection and Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Christopher H. Messom,et al.  Stream processing for fast and efficient rotated Haar-like features using rotated integral images , 2009, Int. J. Intell. Syst. Technol. Appl..