Pedestrian detection in highly crowded scenes using “online” dictionary learning for occlusion handling

Pedestrian detection is one of the most important task for video analytics of an intelligent surveillance system. In this paper, we propose a framework to improve the detection performance of a generic pedestrian detector for highly crowded scenes. The generic offline-trained pedestrian detectors usually cannot handle the problem of detecting pedestrians in highly crowded scenes due to the severe mutual occlusions of the pedestrians. In our approach, we firstly enhance the head detection and suppress the detections of other body parts in the deformable part-based model because the heads of pedestrians less likely to be occluded in highly crowded scenes. Then we propose to utilize multiple-instance dictionary learning to refine the previous detection responses. Compared to other related work, our approach builds a data-adaptive dictionary (codebook) for the heads of pedestrians, hence it can better handle the problem of detecting pedestrians in highly crowded scenes. The experiments on three datasets containing video clips of crowded scenes demonstrated the effectiveness of our proposed approach, significantly improving the state-of-the-art detector.

[1]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Jingdong Wang,et al.  Online Robust Non-negative Dictionary Learning for Visual Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Pascal Frossard,et al.  Dictionary Learning for Stereo Image Representation , 2011, IEEE Transactions on Image Processing.

[5]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shuicheng Yan,et al.  Robust Object Tracking with Online Multi-lifespan Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Zhuowen Tu,et al.  Max-Margin Multiple-Instance Dictionary Learning , 2013, ICML.

[9]  Gang Wang,et al.  Discriminative multi-manifold analysis for face recognition from a single training sample per person , 2011, 2011 International Conference on Computer Vision.

[10]  Gang Wang,et al.  Learning Image Similarity from Flickr Groups Using Fast Kernel Machines , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  James J. Little,et al.  Explicit Occlusion Reasoning for 3D Object Detection , 2011, BMVC.

[12]  Charless C. Fowlkes,et al.  Multiresolution Models for Object Detection , 2010, ECCV.

[13]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Donghui Wang,et al.  A Dictionary Learning Approach for Classification: Separating the Particularity and the Commonality , 2012, ECCV.

[16]  Gang Wang,et al.  Human Identity and Gender Recognition From Gait Sequences With Arbitrary Walking Directions , 2014, IEEE Transactions on Information Forensics and Security.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Gang Wang,et al.  Improved Object Categorization and Detection Using Comparative Object Similarity , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Afshin Dehghan,et al.  Part-based multiple-person tracking with partial occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Martial Hebert,et al.  Occlusion reasoning for object detection under arbitrary viewpoint , 2012, CVPR.

[22]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[24]  Gang Wang,et al.  Tracklet Association with Online Target-Specific Metric Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.