Robust 3-D Human Detection in Complex Environments With a Depth Camera

Human detection has received great attention during the past few decades, which is yet still a challenging problem. In this paper, we focus on the problem of 3-D human detection, i.e., finding the human bodies and determining their 3-D coordinates in complex 3-D space using depth data only. Since the traditional sliding-window-based approaches for target localization are time-consuming and the recent deep-learning-based object detectors generate too many region proposals, we propose to utilize the candidate head-top locating stage to efficiently and quickly find the plausible head-top locations. In the second stage, we propose a Depth map, Multiorder depth template, and Height difference map representation encoding three channels of information for each candidate region to utilize the neural network pretrained on large-scale well-annotated datasets to classify the candidate regions. We evaluate our method on four publicly available challenging datasets. Extensive experimental results demonstrate that the proposed method is superior to the state-of-the-art methods while achieving real-time performance.

[1]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[2]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jun Liu,et al.  Reliably detecting humans in crowded and dynamic environments using RGB-D camera , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[7]  Zixiang Xiong,et al.  DLML: Deep linear mappings learning for face super-resolution with nonlocal-patch , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[8]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Yan Qiu Chen,et al.  Robust human detection with super-pixel segmentation and random ferns classification using RGB-D camera , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[12]  Tong Zhang,et al.  A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.

[13]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jinwen Ma,et al.  Combination features and models for human detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  James M. Rehg,et al.  Real-time human detection using contour cues , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ramakant Nevatia,et al.  Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[19]  Ye Liu,et al.  Detecting and tracking people in real time with RGB-D camera , 2015, Pattern Recognit. Lett..

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yair Weiss,et al.  Learning object detection from a small number of examples: the importance of good features , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Shuicheng Yan,et al.  Discriminative local binary patterns for human detection in personal album , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Yang Liu,et al.  Robust Real-Time Human Perception with Depth Camera , 2016, ECAI.

[30]  Satoshi Goto,et al.  Histogram of Template for Pedestrian Detection , 2010, IEICE Trans. Inf. Syst..

[31]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Silvio Savarese,et al.  Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[33]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[36]  David A. Forsyth,et al.  Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[37]  Jun Liu,et al.  Reliably detecting humans with RGB-D camera with physical blob detector followed by learning-based filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ye Liu,et al.  An ultra-fast human detection method for color-depth camera , 2015, J. Vis. Commun. Image Represent..

[41]  Tiejun Huang,et al.  Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN , 2016, IEEE Transactions on Multimedia.

[42]  Dariu Gavrila,et al.  Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Daniel Herrera C,et al.  Joint depth and color camera calibration with distortion correction. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[45]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Bastian Leibe,et al.  Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.