An information fusion framework for person localization via body pose in spectator crowds

Abstract Person localization or segmentation in low resolution crowded scenes is important for person tracking and recognition, action detection and anomaly identification. Due to occlusion and lack of inter-person space, person localization becomes a difficult task. In this work, we propose a novel information fusion framework to integrate a Deep Head Detector and a body pose detector. A more accurate body pose showing limb positions will result in more accurate person localization. We propose a novel Deep Head Detector (DHD) to detect person heads in crowds. The proposed DHD is a fully convolutional neural network and it has shown improved head detection performance in crowds. We modify Deformable Parts Model (DPM) pose detector to detect multiple upper body poses in crowds. We efficiently fuse the information obtained by the proposed DHD and the modified DPM to obtain a more accurate person pose detector. The proposed framework is named as Fusion DPM (FDPM) and it has exhibited improved body pose detection performance on spectator crowds. The detected body poses are then used for more accurate person localization by segmenting each person in the crowd.

[1]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[2]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[4]  Zhi Liu,et al.  3D-based Deep Convolutional Neural Network for action recognition with depth sequences , 2016, Image Vis. Comput..

[5]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Massimo Piccardi,et al.  Background subtraction techniques: a review , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[7]  Soon Ki Jung,et al.  Motion-Aware Graph Regularized RPCA for background modeling of complex scenes , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[8]  Brendt Wohlberg,et al.  Fast principal component pursuit via alternating minimization , 2013, 2013 IEEE International Conference on Image Processing.

[9]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Christopher Joseph Pal,et al.  Fully automatic person segmentation in unconstrained video using spatio-temporal conditional random fields , 2016, Image Vis. Comput..

[12]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[13]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Arif Mahmood,et al.  Multi-person head segmentation in low resolution crowd scenes using convolutional encoder-decoder framework , 2017 .

[15]  Robert B. Fisher,et al.  Hidden Markov Models for Optical Flow Analysis in Crowds , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[16]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[17]  Shaogang Gong,et al.  Learning human pose in crowd , 2010, MPVA '10.

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[20]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Arif Mahmood,et al.  Superpixels based Manifold Structured Sparse RPCA for Moving Object Detection , 2017 .

[24]  Christopher Joseph Pal,et al.  Automated person segmentation in videos , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[25]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[27]  Xiaolin Hu,et al.  Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dacheng Tao,et al.  GoDec: Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case , 2011, ICML.

[29]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[30]  Ivan Laptev,et al.  Pose Estimation and Segmentation of Multiple People in Stereoscopic Movies , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Bastian Leibe,et al.  Multi-person Tracking with Sparse Detection and Continuous Segmentation , 2010, ECCV.

[33]  Shimon Ullman,et al.  Semantic Hierarchies for Recognizing Objects and Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kun Li,et al.  Foreground–Background Separation From Video Clips via Motion-Assisted Matrix Restoration , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[37]  Qingmin Liao,et al.  Depth-images-based pose estimation using regression forests and graphical models , 2015, Neurocomputing.

[38]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[39]  Xiaowei Zhou,et al.  Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Aphrodite Galata,et al.  Mixtures of Gaussian process models for human pose estimation , 2013, Image Vis. Comput..

[42]  Haroon Idrees,et al.  Detecting Humans in Dense Crowds Using Locally-Consistent Scale Prior and Global Occlusion Reasoning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[47]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[48]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[50]  Hao Chen,et al.  DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[52]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[53]  Nicu Sebe,et al.  The S-HOCK dataset: Analyzing crowds at the stadium , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[56]  Qingmin Liao,et al.  Latent variable pictorial structure for human pose estimation on depth images , 2016, Neurocomputing.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Mubarak Shah,et al.  A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.