Fast human detection in RGB-D images based on color-depth joint feature learning

Human detection in RGB-D images is an important yet very challenging task in computer vision. In this paper, we propose a novel human detection approach in RGB-D images, which integrates ROI (region-of-interest) generation, depth-size relationship estimation and a human detector. Our approach has the following advantages: 1) ROI generation and depth-size relationship estimation take full advantage of color and depth information to fast reject about 70% negative samples while maintaining a high recall rate; 2) the cascade-structured human detector can seamlessly concatenate features extracted from both color and depth images; and 3) our method can detect human at a speed of more than 30 fps on 640 χ 480 images on a single laptop CPU without any GPU acceleration. Experiments on challenging public datasets demonstrate the effectiveness of our method.

[1]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Luiz Velho,et al.  Kinect and RGBD Images: Challenges and Applications , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Krystof Litomisky Consumer RGB-D Cameras and their Applications , 2012 .

[6]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Ye Liu,et al.  Detecting and tracking people in real time with RGB-D camera , 2015, Pattern Recognit. Lett..

[8]  Shihong Lao,et al.  Boosting Associated Pairing Comparison Features for pedestrian detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[9]  Ying Cui,et al.  Real-time human detection and tracking in complex environments using single RGBD camera , 2013, 2013 IEEE International Conference on Image Processing.

[10]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[12]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[15]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.