Robust multiple cameras pedestrian detection with multi-view Bayesian network

Multi-camera pedestrian detection is the challenging problem in the field of surveillance video analysis. However, existing approaches may produce "phantoms" (i.e., fake pedestrians) due to the heavy occlusions in real surveillance scenario, while calibration errors and the diverse heights of pedestrians may also heavily decrease the detection performance. To address these problems, this paper proposes a robust multiple cameras pedestrian detection approach with multi-view Bayesian network model (MvBN). Given the preliminary results obtained by any multi-view pedestrian detection method, which are actually comprised of both real pedestrians and phantoms, the MvBN is used to model both the occlusion relationship and the homography correspondence between them in all camera views. As such, the removal of phantoms can be formulated as an MvBN inference problem. Moreover, to reduce the influence of the calibration errors and keep robust to the diverse heights of pedestrians, a height-adaptive projection (HAP) method is proposed to further improve the detection performance by utilizing a local search process in a small neighborhood of heights and locations of the detected pedestrians. Experimental results on four public benchmarks show that our method outperforms several state-of-the-art algorithms remarkably and demonstrates high robustness in different surveillance scenes. HighlightsA multi-view Bayesian network is proposed to model pedestrian candidates and their occlusion relationships in all views.A parameter learning algorithm is developed for MvBN by using a set of auxiliary, real-valued, and continuous variables.A height-adaptive projection is proposed to make the final detection robust to synthesis noises and calibration errors.Our approach is recognized as the best performer in five PETS evaluations from 2009 to 2013.

[1]  Christophe De Vleeschouwer,et al.  Detection and recognition of sports(wo)men from multiple views , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[2]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[3]  Yannick Boursier,et al.  Sparsity Driven People Localization with a Heterogeneous Network of Cameras , 2011, Journal of Mathematical Imaging and Vision.

[5]  James M. Ferryman,et al.  Suppression of Detection Ghosts in Homography Based Pedestrian Detection , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[6]  Robert T. Collins,et al.  Crowd Detection with a Multiview Sampler , 2010, ECCV.

[7]  Ákos Utasi,et al.  A 3-D marked point process model for multi-view people detection , 2011, CVPR 2011.

[8]  A. Ellis,et al.  PETS2009 and Winter-PETS 2009 results: A combined evaluation , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[9]  Yael Moses,et al.  Homography based multiple camera detection and tracking of people in a dense crowd , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Mubarak Shah,et al.  A Multiview Approach to Tracking People in Crowded Scenes Using a Planar Homography Constraint , 2006, ECCV.

[11]  Edmond Boyer,et al.  Fusion of multiview silhouette cues using a space occupancy grid , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Björn W. Schuller,et al.  Applying multi layer homography for multi camera person tracking , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[15]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  ZhangJing,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video , 2009 .

[17]  Yael Moses,et al.  Tracking in a Dense Crowd Using Multiple Cameras , 2010, International Journal of Computer Vision.

[18]  Rama Chellappa,et al.  Object Detection, Tracking and Recognition for Multiple Smart Cameras , 2008, Proceedings of the IEEE.

[19]  Mubarak Shah,et al.  Tracking Multiple Occluding People by Localizing on Multiple Scene Planes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ákos Utasi,et al.  A Bayesian Approach on People Localization in Multicamera Systems , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Pascal Fua,et al.  Fixed point probability field for complex occlusion handling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Tiejun Huang,et al.  Multi-camera Pedestrian Detection with Multi-view Bayesian Network Model , 2012, BMVC.

[23]  Serge J. Belongie,et al.  Simultaneous Learning and Alignment: Multi-Instance and Multi-Pose Learning ? , 2008 .

[24]  Yali Amit,et al.  Object Detection , 2020, Computer Vision, A Reference Guide.

[25]  Robert T. Collins,et al.  Marked point processes for crowd counting , 2009, CVPR.