Latent Hierarchical Part Based Models for Road Scene Understanding

Road scenes can be naturally interpreted in terms of a hierarchical structure consisting of parts and sub-parts, which captures different degrees of abstraction at different levels of the hierarchy. We introduce Latent Hierarchical Part based Models (LHPMs), which provide a promising framework for interpreting an image using a tree structure, in the case when the root filter for non-leaf nodes may not be available. While HPMs have been developed in the context of object detection and pose estimation, their application to scene understanding is restricted, due to the requirement of having root filters for non-leaf nodes. In this work, we propose a generalization of HPMs that dispenses with the need for having root filters for non-leaf nodes, by treating them as latent variables within a Dynamic Programming based optimization scheme. We experimentally demonstrate the importance of LHPMs for road scene understanding on Continental and KITTI datasets respectively. We find that the hierarchical interpretation leads to intuitive scene descriptions, that is central for autonomous driving.

[1]  Bernt Schiele,et al.  Discriminative Appearance Models for Pictorial Structures , 2011, International Journal of Computer Vision.

[2]  Pushmeet Kohli,et al.  Associative Hierarchical Random Fields , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Christoph Stiller,et al.  Efficient Road Scene Understanding for Intelligent Vehicles Using Compositional Hierarchical Models , 2015, IEEE Transactions on Intelligent Transportation Systems.

[5]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[6]  Bo Zhang,et al.  Color-based road detection in urban traffic scenes , 2004, IEEE Transactions on Intelligent Transportation Systems.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[9]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[10]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Carlo Tomasi,et al.  Nested Pictorial Structures , 2012, ECCV.

[13]  Guangyu Chen,et al.  Texture Based Road Surface Detection , 2008 .

[14]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[15]  Bernt Schiele,et al.  A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[16]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[19]  Franz Kummert,et al.  Spatial ray features for real-time ego-lane extraction , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[20]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tommy Chang,et al.  Color model-based real-time learning for road following , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[22]  Stevica Graovac,et al.  Detection of Road Image Borders Based on Texture Classification , 2012 .

[23]  Mark Everingham,et al.  Shared parts for deformable part-based models , 2011, CVPR 2011.

[24]  Michael J. Black,et al.  From Pictorial Structures to deformable structures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[26]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[27]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[28]  Jason J. Corso Toward parts-based scene understanding with pixel-support parts-sparse pictorial structures , 2013, Pattern Recognit. Lett..

[29]  Jean Ponce,et al.  General Road Detection From a Single Image , 2010, IEEE Transactions on Image Processing.

[30]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[31]  Fernando De la Torre,et al.  Hierarchical CRF with product label spaces for parts-based models , 2011, Face and Gesture 2011.