The contribution of context information: A case study of object recognition in an intelligent car

In this article, we explore the potential contribution of multimodal context information to object detection in an ''intelligent car''. The used car platform incorporates subsystems for the detection of objects from local visual patterns, as well as for the estimation of global scene properties (sometimes denoted ''scene context'' or just ''context'') such as the shape of the road area or the 3D position of the ground plane. Annotated data recorded on this platform is publicly available as the ''HRI RoadTraffic'' vehicle video dataset, which forms the basis for this investigation. In order to quantify the contribution of context information, we investigate whether it can be used to infer object identity with little or no reference to local patterns of visual appearance. Using a challenging vehicle detection task based on the ''HRI RoadTraffic'' dataset, we train selected algorithms (''context models'') to estimate object identity from context information alone. In the course of our performance evaluations, we also analyze the effect of typical real-world conditions (noise, high input dimensionality, environmental variation) on context model performance. As a principal result, we show that the learning of context models is feasible with all tested algorithms, and that object identity can be estimated from context information with similar accuracy as by relying on local pattern recognition methods. We also find that the use of basis function representations[1] (also known as ''population codes'') allows the simplest (and therefore most efficient) learning methods to perform best in the benchmark, suggesting that the use of context is feasible even in systems operating under strong performance constraints.

[1]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[3]  Jannik Fritsch,et al.  Biased Competition in Visual Processing Hierarchies: A Learning Approach Using Multiple Cues , 2011, Cognitive Computation.

[4]  Stefan Schaal,et al.  Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[5]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[7]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[8]  Nando de Freitas,et al.  Target-directed attention: Sequential decision-making for gaze planning , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9]  R. Zemel,et al.  Inference and computation with population codes. , 2003, Annual review of neuroscience.

[10]  Inna Mikhailova,et al.  Organizing multimodal perception for autonomous learning and interactive systems , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[11]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Inna Mikhailova,et al.  Expectation-driven autonomous learning and interaction system , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[13]  Jannik Fritsch,et al.  Cross-module learnin ga s a first step towards a cognitive system concept , 2008 .

[14]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Heiko Wersing,et al.  System approach for multi-purpose representations of traffic scene elements , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[16]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[19]  Christian Igel,et al.  Evolutionary Optimization of Sequence Kernels for Detection of Bacterial Gene Starts , 2006, ICANN.

[20]  Christian Goerick,et al.  Researching and developing a real-time infrastructure for intelligent systems - Evolution of an integrated approach , 2008, Robotics Auton. Syst..

[21]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  T. Rogers,et al.  Where do you know what you know? The representation of semantic knowledge in the human brain , 2007, Nature Reviews Neuroscience.

[23]  Motonobu Hattori,et al.  Avoiding Catastrophic Forgetting by a Dual-Network Memory Model Using a Chaotic Neural Network , 2009 .

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.

[26]  Christian Igel,et al.  Evolutionary Optimization of Neural Networks for Face Detection , 2004, ESANN.

[27]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[28]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[29]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[30]  Barbara Hammer,et al.  Neural Smithing – Supervised Learning in Feedforward Artificial Neural Networks , 2001, Pattern Analysis & Applications.

[31]  Kevin P. Murphy,et al.  A non-myopic approach to visual search , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[32]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[33]  Christian Goerick,et al.  Towards an Understanding of Hierarchical Architectures , 2011, IEEE Transactions on Autonomous Mental Development.

[34]  Jannik Fritsch,et al.  Computationally Efficient Neural Field Dynamics , 2008, ESANN.

[35]  Magdalena Szczot,et al.  Incorporating contextual information in pedestrian recognition , 2009, 2009 IEEE Intelligent Vehicles Symposium.