Fully convolutional neural networks for dynamic object detection in grid maps

Grid maps are widely used in robotics to represent obstacles in the environment and differentiating dynamic objects from static infrastructure is essential for many practical applications. In this work, we present a methods that uses a deep convolutional neural network (CNN) to infer whether grid cells are covering a moving object or not. Compared to tracking approaches, that use e.g. a particle filter to estimate grid cell velocities and then make a decision for individual grid cells based on this estimate, our approach uses the entire grid map as input image for a CNN that inspects a larger area around each cell and thus takes the structural appearance in the grid map into account to make a decision. Compared to our reference method, our concept yields a performance increase from 83.9% to 97.2%. A runtime optimized version of our approach yields similar improvements with an execution time of just 10 milliseconds.

[1]  Yang Wang,et al.  Automatic Detection and Classification of Oil Tanks in Optical Satellite Images Based on Convolutional Neural Network , 2016, ICISP.

[2]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[3]  Sebastian Thrun,et al.  Online simultaneous localization and mapping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[4]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[5]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[6]  Klaus C. J. Dietmayer,et al.  Fusion of laser and radar sensor data with a sequential Monte Carlo Bayesian occupancy filter , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[7]  Trung-Dung Vu,et al.  Online Localization and Mapping with Moving Object Tracking in Dynamic Outdoor Environments , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Avshalom Suissa,et al.  The Daimler-Benz steering assistant: a spin-off from autonomous driving , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[11]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  Reinhold Behringer,et al.  The seeing passenger car 'VaMoRs-P' , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[14]  Klaus C. J. Dietmayer,et al.  A random finite set approach for dynamic occupancy grid maps with real-time application , 2016, Int. J. Robotics Res..

[15]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[16]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[17]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[18]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[19]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[20]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[21]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[22]  S. Thorpe,et al.  Seeking Categories in the Brain , 2001, Science.

[23]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[24]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[25]  Julius Ziegler,et al.  Making Bertha Drive—An Autonomous Journey on a Historic Route , 2014, IEEE Intelligent Transportation Systems Magazine.

[26]  Sebastian Ramos,et al.  The Cityscapes Dataset , 2015, CVPR 2015.

[27]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[28]  Christian Laugier,et al.  Bayesian Occupancy Filtering for Multitarget Tracking: An Automotive Application , 2006, Int. J. Robotics Res..

[29]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[31]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Jean-Marc Odobez,et al.  We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[36]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  N. Haworth,et al.  VISION ZERO: AN ETHICAL APPROACH TO SAFETY AND MOBILITY , 1999 .

[38]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[39]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[40]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  John C. Platt,et al.  Postal Address Block Location Using a Convolutional Locator Network , 1993, NIPS.

[42]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  W. Marsden I and J , 2012 .

[44]  Paulo Peixoto,et al.  Detection and Tracking of Moving Objects Using 2.5D Motion Grids , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[45]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[46]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[47]  Rolf Baxter,et al.  An Adaptive Motion Model for Person Tracking with Instantaneous Head-Pose Features , 2015, IEEE Signal Processing Letters.

[48]  Ting Yuan,et al.  Track fusion with incomplete information for automotive smart sensor systems , 2016, 2016 IEEE Radar Conference (RadarConf).

[49]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Ernst D. Dickmanns,et al.  An integrated spatio-temporal approach to automatic visual guidance of autonomous vehicles , 1990, IEEE Trans. Syst. Man Cybern..

[51]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[53]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[56]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[57]  William Whittaker,et al.  Autonomous Driving in Traffic: Boss and the Urban Challenge , 2009, AI Mag..

[58]  Lucas Beyer,et al.  Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels , 2015, GCPR.

[59]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.