论文信息 - Robust 3-D Human Detection in Complex Environments With a Depth Camera

Robust 3-D Human Detection in Complex Environments With a Depth Camera

Human detection has received great attention during the past few decades, which is yet still a challenging problem. In this paper, we focus on the problem of 3-D human detection, i.e., finding the human bodies and determining their 3-D coordinates in complex 3-D space using depth data only. Since the traditional sliding-window-based approaches for target localization are time-consuming and the recent deep-learning-based object detectors generate too many region proposals, we propose to utilize the candidate head-top locating stage to efficiently and quickly find the plausible head-top locations. In the second stage, we propose a Depth map, Multiorder depth template, and Height difference map representation encoding three channels of information for each candidate region to utilize the neural network pretrained on large-scale well-annotated datasets to classify the candidate regions. We evaluate our method on four publicly available challenging datasets. Extensive experimental results demonstrate that the proposed method is superior to the state-of-the-art methods while achieving real-time performance.

[1] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[2] Jake K. Aggarwal,et al. Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Jun Liu,et al. Reliably detecting humans in crowded and dynamic environments using RGB-D camera , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[5] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[7] Zixiang Xiong,et al. DLML: Deep linear mappings learning for face super-resolution with nonlocal-patch , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[8] Paul A. Viola,et al. Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[11] Yan Qiu Chen,et al. Robust human detection with super-pixel segmentation and random ferns classification using RGB-D camera , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[12] Tong Zhang,et al. A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.

[13] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Jinwen Ma,et al. Combination features and models for human detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] James M. Rehg,et al. Real-time human detection using contour cues , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16] Bernt Schiele,et al. Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ramakant Nevatia,et al. Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[19] Ye Liu,et al. Detecting and tracking people in real time with RGB-D camera , 2015, Pattern Recognit. Lett..

[20] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Yair Weiss,et al. Learning object detection from a small number of examples: the importance of good features , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23] Xiaogang Wang,et al. Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[24] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Kai Oliver Arras,et al. People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26] Gang Wang,et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[27] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Shuicheng Yan,et al. Discriminative local binary patterns for human detection in personal album , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Yang Liu,et al. Robust Real-Time Human Perception with Depth Camera , 2016, ECAI.

[30] Satoshi Goto,et al. Histogram of Template for Pedestrian Detection , 2010, IEICE Trans. Inf. Syst..

[31] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Silvio Savarese,et al. Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[33] Xiaogang Wang,et al. Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35] Ronald M. Summers,et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[36] David A. Forsyth,et al. Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[37] Jun Liu,et al. Reliably detecting humans with RGB-D camera with physical blob detector followed by learning-based filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38] Luc Van Gool,et al. Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Gang Wang,et al. Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Ye Liu,et al. An ultra-fast human detection method for color-depth camera , 2015, J. Vis. Commun. Image Represent..

[41] Tiejun Huang,et al. Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN , 2016, IEEE Transactions on Multimedia.

[42] Dariu Gavrila,et al. Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Ramakant Nevatia,et al. Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44] Daniel Herrera C,et al. Joint depth and color camera calibration with distortion correction. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[45] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Shuicheng Yan,et al. An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47] Bastian Leibe,et al. Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[48] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.