Example-Based 3D Trajectory Extraction of Objects From 2D Videos

For semantic analysis of activities and events in videos, it is important to capture the spatio-temporal relation among objects in the 3D space. In this paper, we present a probabilistic method that extracts 3D trajectories of objects from 2D videos, captured from a monocular moving camera. Compared with existing methods that rely on restrictive assumptions, we propose a method that can extract 3D trajectories with much less restriction by adopting new example-based techniques, which compensate the lack of information. Here, we estimate the focal length of the camera based on similar candidates, and use it to compute depths of detected objects. Contrary to other 3D trajectory extraction methods, our method is able to process videos taken from a stable camera as well as a non-calibrated moving camera without restrictions. For this, we modify Reversible Jump Markov Chain Monte Carlo particle filtering to be more suitable for camera odometry without relying on geometrical feature points. Moreover, our method decreases time consumption by reducing the number of object detections with keypoint matching. Finally, we evaluate our method on known data sets by showing the robustness of our system and demonstrating its efficiency in dealing with different kind of videos.

[1]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Bernt Schiele,et al.  Monocular 3D scene understanding with explicit occlusion reasoning , 2011, CVPR 2011.

[3]  José Luis Lázaro,et al.  Tracking People Motion Based on Extended Condensation Algorithm , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[4]  Anderson Rocha,et al.  A multiple camera methodology for automatic localization and tracking of futsal players , 2014, Pattern Recognit. Lett..

[5]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[7]  Bernt Schiele,et al.  Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[9]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Bärbel Mertsching,et al.  Fast Techniques for Monocular Visual Odometry , 2015, GCPR.

[13]  Tien Tsin,et al.  Image Partial Blur Detection and Classification , 2013 .

[14]  Silvio Savarese,et al.  Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera , 2010, ECCV.

[15]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Bärbel Mertsching,et al.  On the Second Order Statistics of Essential Matrix Elements , 2014, GCPR.

[17]  Silvio Savarese,et al.  Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[18]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[23]  Xin Yu,et al.  Object Tracking With Multi-View Support Vector Machines , 2015, IEEE Transactions on Multimedia.

[24]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[25]  Branislav Micus ´ ik Estimation of omnidirectional camera model from epipolar geometry , 2003 .

[26]  Shiyu Song,et al.  Robust Scale Estimation in Real-Time Monocular SFM for Autonomous Driving , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bastian Leibe,et al.  Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[29]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Rémi Mégret,et al.  Robust large scale monocular visual SLAM , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Abdallah Dib,et al.  Robust dense visual odometry for RGB-D cameras in a dynamic environment , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[32]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Josechu J. Guerrero,et al.  Inverse depth for accurate photometric and geometric error minimisation in RGB-D dense visual odometry , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Daniel Cremers,et al.  Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Payam Saisan,et al.  Multi-View Classifier Swarms for Pedestrian Detection and Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[36]  Liujuan Cao,et al.  Robust depth-based object tracking from a moving binocular camera , 2015, Signal Process..

[37]  Daniel Cremers,et al.  Robust odometry estimation for RGB-D cameras , 2013, 2013 IEEE International Conference on Robotics and Automation.

[38]  Javier González,et al.  Fast Visual Odometry for 3-D Range Sensors , 2015, IEEE Transactions on Robotics.

[39]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[40]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Silvio Savarese,et al.  Monocular Multiview Object Tracking with 3D Aspect Parts , 2014, ECCV.

[42]  Olaf Kähler,et al.  Object-aware bundle adjustment for correcting monocular scale drift , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).