Learning to Detect and Track People in RGBD Data
暂无分享,去创建一个
Introduction People detection and tracking is an important and fundamental component for many robots, interactive systems and intelligent vehicles. Previous works have used cameras and 2D and 3D range finders for this task. In this paper, we present a 3D people detection and tracking approach using RGB-D data. Given the richness of the data, we learn target appearance models for the purpose of improved detection and data association. This is a new aspect for range-based target tracking which usually deals with objects of identical appearance. To this end, we take an online boosting approach similar to [2] and learn a strong target classifier based on three types of RGB-D features. This results in an on-line people detector which is combined with a novel people detector based on a generic learned model, both integrated into a multi-hypothesis tracking system. The new generic detector, called Combo-HOD, fuses the image and depth information in the data. It is composed of the HOG detector (Histogram of Oriented Gradients) [1] applied on the RGB image and a newly introduced HOD approach (Histogram of Oriented Depths), inspired from the former, applied on the depth image. HOD locally encodes the direction of depth changes and relies on an depth-informed scale-space search that leads to a 3-fold acceleration of the detection process. Combo-HOD is general in that it neither relies on background learning nor on a ground plane assumption. For the evaluation we collect RGB-D data in a populated indoor environment with a setup of three Microsoft Kinect sensors with a joint field of view. The experiments demonstrate reliable 3D detection and tracking of people in RGB-D data up to 8 m from the sensor and further show how the on-line detector improves the overall tracking performance. This paper advances the state-of-the-art in the following aspects. First, we address the novel problem of detecting people in RGB-D data in distances far beyond the recommended sensor operating range (called adequate play space, see Fig. 2), second, we perform tracking of people in 3D data with a multi-hypothesis tracker (MHT), and third, we propose an online-learning method of target appearances along with its integration into the MHT.
[1] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[2] Horst Bischof,et al. On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).