Extracting 3D Trajectories of Objects from 2D Videos using Particle Filter

Depth estimation is a method to estimate the depth information in a 2D image/video, where the original 3D space is projected onto an image plane. This paper introduces a novel extension of depth estimation in the video domain, where we extract 3D trajectories which individually represent the transition of an object in the 3D space. Such 3D trajectories are useful for appropriately characterising spatio-temporal object relations for video event detection. While we extract 3D trajectories by combining depth estimation and object detection results, the major problem is the inconsistency between these results. For example, significantly different depths may be estimated for the region of the same object, and an object region that is appropriately shaped by estimated depths may be missed. To overcome this, we first initialise the 3D position of an object using the frame with the highest consistency between the depth estimation and object detection results. Then, we track the object in the 3D space using particle filter, where a 3D position of the object is modelled as a hidden state to generate its 2D visual appearance. Experimental results demonstrate the effectiveness of our method.

[1]  Marleen de Bruijne,et al.  Image segmentation by shape particle filtering , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[2]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[3]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rómer Rosales,et al.  3D trajectory recovery for tracking multiple objects and trajectory guided recognition of actions , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  Nikos Paragios,et al.  Image Reconstruction Using Particle Filters and Multiple Hypotheses Testing , 2010, IEEE Transactions on Image Processing.

[7]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from an Image , 2011, International Journal of Computer Vision.

[8]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[11]  Rolf Lakämper,et al.  Using the Particle Filter Approach to Building Partial Correspondences Between Shapes , 2010, International Journal of Computer Vision.

[12]  Gonen Eren,et al.  Evaluation of video activity localizations integrating quality and quantity measurements , 2014, Comput. Vis. Image Underst..

[13]  Nanning Zheng,et al.  Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Yun Fu,et al.  Prediction of Human Activity by Discovering Temporal Sequence Patterns , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[16]  Raquel Urtasun,et al.  Physically-based motion models for 3D tracking: A convex formulation , 2011, 2011 International Conference on Computer Vision.

[17]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[20]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[21]  Aamir Saeed Malik,et al.  Comparison of stochastic filtering methods for 3D tracking , 2011, Pattern Recognit..

[22]  Patrick Pérez,et al.  Maintaining multimodality through mixture tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  José Luis Lázaro,et al.  Tracking People Motion Based on Extended Condensation Algorithm , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[24]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[26]  S. Nedevschi,et al.  Object tracking from stereo sequences using particle filter , 2008, 2008 4th International Conference on Intelligent Computer Communication and Processing.

[27]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[28]  Alan Fern,et al.  Discriminatively trained particle filters for complex multi-object tracking , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Sungroh Yoon,et al.  Application-Support Particle Filter for Dynamic Voltage Scaling of Multimedia Applications , 2012, IEEE Transactions on Computers.

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  Yuan Li,et al.  Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Lifespans , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Payam Saisan,et al.  Multi-View Classifier Swarms for Pedestrian Detection and Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[33]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[34]  Cordelia Schmid,et al.  Explicit Modeling of Human-Object Interactions in Realistic Videos , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Milton Katz,et al.  Introduction To Geometrical Optics , 1994 .

[38]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.