Combined image- and world-space tracking in traffic scenes

Tracking in urban street scenes plays a central role in autonomous systems such as self-driving cars. Most of the current vision-based tracking methods perform tracking in the image domain. Other approaches, e.g. based on LIDAR and radar, track purely in 3D. While some vision-based tracking methods invoke 3D information in parts of their pipeline, and some 3D-based methods utilize image-based information in components of their approach, we propose to use image- and world-space information jointly throughout our method. We present our tracking pipeline as a 3D extension of image-based tracking. From enhancing the detections with 3D measurements to the reported positions of every tracked object, we use world-space 3D information at every stage of processing. We accomplish this by our novel coupled 2D-3D Kalman filter, combined with a conceptually clean and extendable hypothesize-and-select framework. Our approach matches the current state-of-the-art on the official KITTI benchmark, which performs evaluation in the 2D image domain only. Further experiments show significant improvements in 3D localization precision by enabling our coupled 2D-3D tracking.

[1]  B. Leibe,et al.  Taking Mobile Multi-object Tracking to the Next Level: People, Unknown Objects, and Carried Items , 2012, ECCV.

[2]  Wolfram Burgard,et al.  Motion-based detection and tracking in 3D LiDAR scans , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Andreas Geiger,et al.  FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Silvio Savarese,et al.  Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[6]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Konrad Schindler,et al.  Perspective n-View Multibody Structure-and-Motion Through Model Selection , 2006, ECCV.

[8]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[9]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[10]  Charless C. Fowlkes,et al.  Learning Optimal Parameters For Multi-target Tracking , 2015, BMVC.

[11]  Bastian Leibe,et al.  Exploring bounding box context for multi-object tracker fusion , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Lynne E. Parker,et al.  Real-Time Multiple Human Perception With Color-Depth Cameras on a Mobile Robot , 2013, IEEE Transactions on Cybernetics.

[13]  Ryan M. Eustice,et al.  Continuous-time estimation for dynamic obstacle tracking , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Konrad Schindler,et al.  Piecewise Rigid Scene Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Silvio Savarese,et al.  Combining 3D Shape, Color, and Motion for Robust Anytime Tracking , 2014, Robotics: Science and Systems.

[18]  Sebastian Thrun,et al.  Towards 3D object recognition via classification of arbitrary object tracks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Ming-Hsuan Yang,et al.  Online Multi-object Tracking via Structural Constraint Event Aggregation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[23]  Bastian Leibe,et al.  Real-Time Multi-Person Tracking with Time-Constrained Detection , 2011, BMVC.

[24]  Bastian Leibe,et al.  Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[26]  David Beymer,et al.  Real-Time Tracking of Multiple People Using Continuous Detection , 1999 .

[27]  Bastian Leibe,et al.  Real-time multi-person tracking with detector assisted structure propagation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Shiyu Song,et al.  Joint SFM and detection cues for monocular 3D localization in road scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Roland Siegwart,et al.  Generative object detection and tracking in 3D range data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[33]  Mayank Bansal,et al.  A real-time pedestrian detection system based on structure and appearance classification , 2010, 2010 IEEE International Conference on Robotics and Automation.

[34]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[36]  Paul Newman,et al.  What could move? Finding cars, pedestrians and bicyclists in 3D laser data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[37]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[38]  Bastian Leibe,et al.  Multi-scale object candidates for generic object tracking in street scenes , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Rudolf Mester,et al.  Know Your Limits: Accuracy of Long Range Stereoscopic Object Measurements in Practice , 2014, ECCV.

[40]  Bastian Leibe,et al.  Multi-person Tracking with Sparse Detection and Continuous Segmentation , 2010, ECCV.

[41]  B. V. K. Vijaya Kumar,et al.  A multi-sensor fusion system for moving object detection and tracking in urban driving environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[43]  Christoph Stiller,et al.  Joint self-localization and tracking of generic objects in 3D range data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44]  Majid Mirmehdi,et al.  Multiple Human Tracking in RGB-D Data: A Survey , 2016, ArXiv.