Automatic Recognition and Understanding of the Driving Environment for Driver Feedback

DISCLAIMER The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the information presented herein. This document is disseminated under the A smart driving system must consider two key elements to be able to generate recommendations and make driving decisions that are effective and accurate: The environment of the car and the behavior of the driver. Our long-term goal is to develop techniques for building internal models of the vehicle's static environment (objects, features, terrain) and of the vehicle's dynamic environment (people and vehicle moving in the vehicle's environment) from sensor data, which can operate online and can be used to provide the information necessary to make recommendations, to generate alarms, or to take emergency action. Our overall approach is to combine recent progress in machine perception with the rapid advent of onboard sensors, and the availability of external data sources, such as maps. Understanding the environment of a vehicle can be envisioned at different levels of details from low-level signals that characterize the location of potential hazards to high-level descriptions that include semantic information such as recognizing specific types of objects, systems already include systems in which sensors are able to produce a coarse map of obstacles in regions around the vehicle. These capabilities are limited to fairly coarse descriptions of the environment. A notable exception is in the area of people and car detection (e.g., the MobileEye 1 system), in which a commercial product is already available. However, even in this case, interpretation of the sensor data is limited to the location and motion of the object but does not include higher-level predictive information about pattern of motion and future actions. We believe that now, given the availability of sensors that provide rich data, there is an opportunity to develop techniques that can generate far more complete and higher-level 1 www.us.mobileye.com descriptions of the vehicle's environment than was ever possible before. Given input (images and 3D) from these sensors, the first component of our approach relies on recent development in the general area of scene understanding. Specifically, our approach is to extend state-of-the-art machine perception techniques in three areas: 1) Scene understanding from images in which objects, regions, and features are identified based on image input; 2) Scene understanding from the type of 3D point clouds acquired from, for example, stereo of LIDAR systems; and 3) analysis …

[1]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[2]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Theo Gevers,et al.  Understanding Road Scenes Using Visual Cues and GPS Information , 2012, ECCV Workshops.

[4]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[5]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[6]  Roberto Cipolla,et al.  Label propagation in video sequences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Martial Hebert,et al.  Combining Simple Discriminators for Object Discrimination , 2002, ECCV.

[8]  Bastian Leibe,et al.  Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andreas Geiger,et al.  Joint 3D Estimation of Objects and Scene Layout , 2011, NIPS.

[10]  Antonio Torralba,et al.  A Framework for Encoding Object-level Image Priors , 2011 .

[11]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[12]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[13]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[14]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[16]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[17]  Lucas Paletta,et al.  Geo-contextual priors for attentive urban object recognition , 2009, 2009 IEEE International Conference on Robotics and Automation.

[18]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Hermann Ney,et al.  Maximum Entropy and Gaussian Models for Image Object Recognition , 2002, DAGM-Symposium.

[20]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[21]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[22]  Song-Chun Zhu Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998 .

[23]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25]  Martin Lauer,et al.  A generative model for 3D urban scene understanding from movable platforms , 2011, CVPR 2011.

[26]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Jianxiong Xiao,et al.  Supervised Label Transfer for Semantic Segmentation of Street Scenes , 2010, ECCV.

[29]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[30]  Qinping Zhao,et al.  Supervised Geodesic Propagation for Semantic Label Transfer , 2012, ECCV.

[31]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[32]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[33]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.