论文信息 - Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes

Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes

Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In this paper, we present a novel probabilistic 3D scene model that encompasses multi-class object detection, object tracking, scene labeling, and 3D geometric relations. This integrated 3D model is able to represent complex interactions like inter-object occlusion, physical exclusion between objects, and geometric context. Inference allows to recover 3D scene context and perform 3D multiobject tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. In particular, we show that a joint scene track-let model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for two different types of challenging on-board sequences. We first show a substantial improvement to the state-of-the-art in 3D multi-people tracking. Moreover, a similar performance gain is achieved for multi-class 3D tracking of cars and trucks on a new, challenging dataset.

[1] Bernt Schiele,et al. A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[2] Jiří Matas,et al. Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[3] Michael Isard,et al. BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4] P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[5] Luc Van Gool,et al. Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[6] Ramakant Nevatia,et al. Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Zhuowen Tu,et al. Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8] Antonio Criminisi,et al. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[9] Navneet Dalal,et al. Finding People in Images and Videos , 2006 .

[10] James J. Little,et al. A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[11] Luc Van Gool,et al. Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12] Frank Dellaert,et al. MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] A. Shashua,et al. Pedestrian detection for driving assistance systems: single-frame classification and system level performance , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[14] Dariu Gavrila,et al. Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.

[15] Antonio Torralba,et al. Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Luc Van Gool,et al. Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Antonio Torralba,et al. Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[18] Ram Nevatia,et al. Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, CVPR.

[19] Ramakant Nevatia,et al. Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[20] Roberto Cipolla,et al. Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[21] Axel Pinz,et al. Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[22] A. G. Amitha Perera,et al. A unified framework for tracking through occlusions and across sensor gaps , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23] Andrew J. Davison,et al. Active Matching , 2008, ECCV.

[24] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, CVPR.

[25] Bernt Schiele,et al. Multi-cue onboard pedestrian detection , 2009, CVPR.

[26] Peter Green,et al. Markov chain Monte Carlo in Practice , 1996 .