Tracking people in 3D using a bottom-up top-down detector

People detection and tracking is a key component for robots and autonomous vehicles in human environments. While prior work mainly employed image or 2D range data for this task, in this paper, we address the problem using 3D range data. In our approach, a top-down classifier selects hypotheses from a bottom-up detector, both based on sets of boosted features. The bottom-up detector learns a layered person model from a bank of specialized classifiers for different height levels of people that collectively vote into a continuous space. Modes in this space represent detection candidates that each postulate a segmentation hypothesis of the data. In the top-down step, the candidates are classified using features that are computed in voxels of a boosted volume tessellation. We learn the optimal volume tessellation as it enables the method to stably deal with sparsely sampled and articulated objects. We then combine the detector with tracking in 3D for which we take a multi-target multi-hypothesis tracking approach. The method neither needs a ground plane assumption nor relies on background learning. The results from experiments in populated urban environments demonstrate 3D tracking and highly robust people detection up to 20 m with equal error rates of at least 93%.

[1]  Roland Siegwart,et al.  A Layered Approach to People Detection in 3D Range Data , 2010, AAAI.

[2]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  A. Howard,et al.  Results from a Real-time Stereo-based Pedestrian Detection System on a Moving Vehicle , 2009 .

[5]  Christoph Mertz,et al.  Pedestrian Detection and Tracking Using Three-dimensional LADAR Data , 2010, Int. J. Robotics Res..

[6]  Thierry Chateau,et al.  Pedestrian detection method using a multilayer laserscanner: Application in urban environment , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[8]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Ryo Kurazume,et al.  Multi-Part People Detection Using 2D Range Data , 2010, Int. J. Soc. Robotics.

[11]  Wolfram Burgard,et al.  People Tracking with Mobile Robots Using Sample-Based Joint Probabilistic Data Association Filters , 2003, Int. J. Robotics Res..

[12]  Wolfram Burgard,et al.  Efficient people tracking in laser range data using a multi-hypothesis leg-tracker with adaptive occlusion probabilities , 2008, 2008 IEEE International Conference on Robotics and Automation.

[13]  Wolfram Burgard,et al.  Using Boosted Features for the Detection of People in 2D Range Data , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[14]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Erwin Prassler,et al.  Fast and robust tracking of multiple moving objects with a laser range finder , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[16]  Ingemar J. Cox,et al.  An Efficient Implementation of Reid's Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Shin'ichi Yuta,et al.  Fusion of double layered multiple laser range finders for people detection from a mobile robot , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[18]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[19]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[22]  Roland Siegwart,et al.  Multimodal People Detection and Tracking in Crowded Scenes , 2008, AAAI.

[23]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[24]  Universityof SouthernCalifornia LosAngeles Laser-based People Tracking , 2002 .