Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes

Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[3]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[4]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[5]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Andrew Zisserman,et al.  Classifying Images of Materials: Achieving Viewpoint and Illumination Independence , 2002, ECCV.

[7]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[8]  A. Shashua,et al.  Pedestrian detection for driving assistance systems: single-frame classification and system level performance , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[9]  J. Marques,et al.  On-line Tracking Groups of Pedestrians with Bayesian Networks , 2004 .

[10]  Chiou-Shann Fuh,et al.  Fast Object Detection with Occlusions , 2004, ECCV.

[11]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[12]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[13]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  A. G. Amitha Perera,et al.  A unified framework for tracking through occlusions and across sensor gaps , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Yacov Hel-Or,et al.  Real-time pattern matching using projection kernels , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[18]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Navneet Dalal,et al.  Finding People in Images and Videos , 2006 .

[21]  A. G. Amitha Perera,et al.  Multi-Object Tracking Through Simultaneous Long Occlusions and Split-Merge Conditions , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Dariu Gavrila,et al.  Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.

[23]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[25]  Stefan Carlsson,et al.  Multi-Target Tracking - Linking Identities using Bayesian Network Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Larry S. Davis,et al.  Bilattice-based Logical Reasoning for Human Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Ram Nevatia,et al.  Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[34]  Bo Wu,et al.  Pedestrian Tracking by Associating Tracklets using Detection Residuals , 2008, 2008 IEEE Workshop on Motion and video Computing.

[35]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Andrew Zisserman,et al.  Structured output regression for detection with partial truncation , 2009, NIPS.

[38]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Konrad Schindler,et al.  Improved Multi-Person Tracking with Active Occlusion Handling , 2009, ICRA 2009.

[40]  B. Schiele,et al.  Multi-cue onboard pedestrian detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  H. Ai,et al.  Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[44]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Silvio Savarese,et al.  Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera , 2010, ECCV.

[46]  Dariu Gavrila,et al.  Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Konrad Schindler,et al.  Globally Optimal Multi-target Tracking on a Hexagonal Lattice , 2010, ECCV.

[48]  Bernt Schiele,et al.  Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes , 2010, ECCV.

[49]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[50]  Bernt Schiele,et al.  New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Martin Lauer,et al.  A generative model for 3D urban scene understanding from movable platforms , 2011, CVPR 2011.

[52]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[53]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[54]  Bernt Schiele,et al.  Monocular 3D scene understanding with explicit occlusion reasoning , 2011, CVPR 2011.

[55]  Bohyung Han,et al.  Learning occlusion with likelihoods for visual tracking , 2011, 2011 International Conference on Computer Vision.

[56]  Daphne Koller,et al.  A segmentation-aware object detection model with occlusion handling , 2011, CVPR 2011.

[57]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.