Bayesian 3D model based human detection in crowded scenes using efficient optimization

In this paper, we solve the problem of human detection in crowded scenes using a Bayesian 3D model based method. Human candidates are first nominated by a head detector and a foot detector, then optimization is performed to find the best configuration of the candidates and their corresponding shape models. The solution is obtained by decomposing the mutually related candidates into un-occluded ones and occluded ones in each iteration, and then performing model matching for the un-occluded candidates. To this end, in addition to some obvious clues, we also derive a graph that depicts the inter-object relation so that unreasonable decomposition is avoided. The merit of the proposed optimization procedure is that its computational cost is similar to the greedy optimization methods while its performance is comparable to the global optimization approaches. For model matching, it is performed by employing both prior knowledge and image likelihood, where the priors include the distribution of individual shape models and the restriction on the inter-object distance in real world, and image likelihood is provided by foreground extraction and the edge information. After the model matching, a validation and rejection strategy based on minimum description length is applied to confirm the candidates that have reliable matching results. The proposed method is tested on both the publicly available Caviar dataset and a challenging dataset constructed by ourselves. The experimental results demonstrate the effectiveness of our approach.

[1]  Ramakant Nevatia,et al.  Bayesian human segmentation in crowded situations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Robert T. Collins,et al.  Marked point processes for crowd counting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  H. Bischof,et al.  Fast human detection in crowded scenes by contour integration and local shape estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  A. B. Drought,et al.  WALKING PATTERNS OF NORMAL MEN. , 1964, The Journal of bone and joint surgery. American volume.

[6]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Peter H. Tu,et al.  Simultaneous estimation of segmentation and shape , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Lu Wang,et al.  Extraction of Moving Objects From Their Background Based on Multiple Adaptive Thresholds and Boundary Evaluation , 2010, IEEE Transactions on Intelligent Transportation Systems.

[9]  Henry Dreyfuss,et al.  Measure of Man and Woman: Human Factors in Design , 1993 .

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  A R Tilley,et al.  THE MEASURE OF MAN AND WOMAN , 1993 .

[12]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Lior Wolf,et al.  Image representations beyond histograms of gradients: The role of Gestalt descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Larry S. Davis,et al.  Hierarchical Part-Template Matching for Human Detection and Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.