Multimodal People Detection and Tracking in Crowded Scenes

This paper presents a novel people detection and tracking method based on a multi-modal sensor fusion approach that utilizes 2D laser range and camera data. The data points in the laser scans are clustered using a novel graph-based method and an SVM based version of the cascaded AdaBoost classifier is trained with a set of geometrical features of these clusters. In the detection phase, the classified laser data is projected into the camera image to define a region of interest for the vision-based people detector. This detector is a fast version of the Implicit Shape Model (ISM) that learns an appearance codebook of local SIFT descriptors from a set of hand-labeled images of pedestrians and uses them in a voting scheme to vote for centers of detected people. The extension consists in a fast and detailed analysis of the spatial distribution of voters per detected person. Each detected person is tracked using a greedy data association method and multiple Extended Kalman Filters that use different motion models. This way, the filter can cope with a variety of different motion patterns. The tracker is asynchronously updated by the detections from the laser and the camera data. Experiments conducted in real-world outdoor scenarios with crowds of pedestrians demonstrate the usefulness of our approach.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Wolfram Burgard,et al.  Map building with mobile robots in dynamic environments , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[4]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[5]  Henrik I. Christensen,et al.  Tracking for following and passing persons , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[7]  António E. Ruano,et al.  Fast Line, Arc/Circle and Leg Detection from Laser Scan Data in a Player Driver , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[8]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Roland Siegwart,et al.  Human detection using multimodal and multidimensional features , 2008, 2008 IEEE International Conference on Robotics and Automation.

[10]  Dirk Schulz,et al.  A Probabilistic Exemplar Approach to Combine Laser and Vision for Person Tracking , 2006, Robotics: Science and Systems.

[11]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[12]  Ben J. A. Kröse,et al.  Part based people detection using 2D range data and images , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[15]  Ingemar J. Cox,et al.  On Finding Ranked Assignments With Application to Multi-Target Tracking and Motion Correspondence , 1995 .

[16]  J CoxIngemar A review of statistical data association for motion correspondence , 1993 .

[17]  Pólo de Coimbra,et al.  Segmentation and Geometric Primitives Extraction from 2D Laser Range Data for Mobile Robot Applications , 2005 .

[18]  Ryosuke Shibasaki,et al.  Tracking multiple people using laser and vision , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Cristiano Premebida Segmentation and Geometric Primitives Extraction from 2D Laser Range Data for Mobile Robot Applications , 2005 .

[20]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[23]  Kunle Olukotun,et al.  The Identity Management Kalman Filter (IMKF) , 2006, Robotics: Science and Systems.

[24]  Gregory D. Hager,et al.  Probabilistic Data Association Methods for Tracking Complex Visual Objects , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[26]  Matthias Scheutz,et al.  Fast, reliable, adaptive, bimodal people tracking for indoor environments , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[27]  Wolfram Burgard,et al.  Using Boosted Features for the Detection of People in 2D Range Data , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[28]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[30]  Wolfram Burgard,et al.  People Tracking with Mobile Robots Using Sample-Based Joint Probabilistic Data Association Filters , 2003, Int. J. Robotics Res..