Robust Object Tracking in Crowd Dynamic Scenes Using Explicit Stereo Depth

In this paper, we exploit robust depth information with simple color-shape appearance model on single object tracking in crowd dynamic scenes. Since binocular video streams are captured from a moving camera rig, background subtraction cannot provide a reliable enhancement of region of interest. Our main contribution is a novel tracking strategy to employ explicit stereo depth to track and segment object in crowd dynamic scenes with occlusion handling. Appearance cues including color and shape play a secondary role to further extract the foreground acquired by previous depth-based segmentation. The proposed depth-driven tracking approach can largely alleviate the drifting issue, especially when the object frequently interacts with similar background in long sequence tracking. The problems caused by rapid object appearance change can also be avoided due to the stability of the depth cue. Furthermore, we propose a new, yet simple and effective depth-based scheme to cope with complete occlusion in tracking. From experiments on a large collection of challenging outdoor and indoor sequences, our algorithm demonstrates accurate and reliable tracking performance which outperforms other state-of-the-art competing algorithms.

[1]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[2]  Mubarak Shah,et al.  Tracking Multiple Occluding People by Localizing on Multiple Scene Planes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Reinhard Koch,et al.  Dynamic 3D Imaging, DAGM 2009 Workshop, Dyn3D 2009, Jena, Germany, September 9, 2009. Proceedings , 2009, Dyn3D.

[6]  Michael Harville,et al.  Fast, integrated person tracking and activity recognition with plan-view templates from a single stereo camera , 2004, CVPR 2004.

[7]  Huchuan Lu,et al.  Superpixel tracking , 2011, 2011 International Conference on Computer Vision.

[8]  Gregory D. Hager,et al.  A Nonparametric Treatment for Location/Segmentation Based Visual Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Luc Van Gool,et al.  Automatic detection and tracking of pedestrians from a moving stereo rig , 2010 .

[12]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[13]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Bastian Leibe,et al.  Real-time multi-person tracking with detector assisted structure propagation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[15]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[17]  Mayank Bansal,et al.  A real-time pedestrian detection system based on structure and appearance classification , 2010, 2010 IEEE International Conference on Robotics and Automation.

[18]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[19]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[21]  Alan F. Smeaton,et al.  A Framework for Evaluating Stereo-Based Pedestrian Detection Techniques , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Dariu Gavrila,et al.  Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.

[23]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[24]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[25]  Shane Brennan,et al.  A Fast Stereo-based System for Detecting and Tracking Pedestrians from a Moving Vehicle , 2009, Int. J. Robotics Res..

[26]  Andreas Geiger,et al.  Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[27]  Michael Werman,et al.  Fusing Time-of-Flight Depth and Color for Real-Time Segmentation and Tracking , 2009, Dyn3D.

[28]  Ian D. Reid,et al.  Nonlinear shape manifolds as shape priors in level set segmentation and tracking , 2011, CVPR 2011.

[29]  Gregory D. Hager,et al.  Dynamic Foreground/Background Extraction from Images and Videos using Random Patches , 2006, NIPS.

[30]  Radim Sára,et al.  A Weak Structure Model for Regular Pattern Recognition Applied to Facade Images , 2010, ACCV.

[31]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.