Multi-Modal Obstacle Detection in Unstructured Environments with Conditional Random Fields

Reliable obstacle detection and classification in rough and unstructured terrain such as agricultural fields or orchards remains a challenging problem. These environments involve large variations in both geometry and appearance, challenging perception systems that rely on only a single sensor modality. Geometrically, tall grass, fallen leaves, or terrain roughness can mistakenly be perceived as nontraversable or might even obscure actual obstacles. Likewise, traversable grass or dirt roads and obstacles such as trees and bushes might be visually ambiguous. In this paper, we combine appearance- and geometry-based detection methods by probabilistically fusing lidar and camera sensing with semantic segmentation using a conditional random field. We apply a state-of-the-art multimodal fusion algorithm from the scene analysis domain and adjust it for obstacle detection in agriculture with moving ground vehicles. This involves explicitly handling sparse point cloud data and exploiting both spatial, temporal, and multimodal links between corresponding 2D and 3D regions. The proposed method was evaluated on a diverse data set, comprising a dairy paddock and different orchards gathered with a perception research robot in Australia. Results showed that for a two-class classification problem (ground and nonground), only the camera leveraged from information provided by the other modality with an increase in the mean classification score of 0.5%. However, as more classes were introduced (ground, sky, vegetation, and object), both modalities complemented each other with improvements of 1.4% in 2D and 7.9% in 3D. Finally, introducing temporal links between successive frames resulted in improvements of 0.2% in 2D and 1.5% in 3D.

[1]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bertrand Douillard,et al.  An occlusion-aware feature for range images , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[5]  Junqiang Xi,et al.  Self‐supervised learning to visually detect terrain surfaces for autonomous robots operating in forested terrain , 2012, J. Field Robotics.

[6]  Stefan B. Williams,et al.  Multimodal learning and inference from visual and remotely sensed data , 2017, Int. J. Robotics Res..

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Andreas Zell,et al.  Terrain classification with conditional random fields on fused 3D LIDAR and camera data , 2013, 2013 European Conference on Mobile Robots.

[9]  Martial Hebert,et al.  Classifier fusion for outdoor obstacle detection , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[10]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[12]  Lars Petersson,et al.  A Multi-modal Graphical Model for Scene Analysis , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[13]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[15]  Dieter Fox,et al.  A Spatio-Temporal Probabilistic Model for Multi-Sensor Multi-Class Object Recognition , 2007, ISRR.

[16]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[17]  Dietrich Paulus,et al.  Probabilistic terrain classification in unstructured environments , 2013, Robotics Auton. Syst..

[18]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[19]  Giulio Reina,et al.  Ambient awareness for agricultural robotic vehicles , 2016, ArXiv.

[20]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[21]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[22]  Aaron C. Courville,et al.  Interacting Markov Random Fields for Simultaneous Terrain Modeling and Obstacle Detection , 2005, Robotics: Science and Systems.

[23]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[24]  Martial Hebert,et al.  Terrain Classification Techniques From Ladar Data For Autonomous Navigation , 2002 .

[25]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Giulio Reina,et al.  LIDAR and stereo combination for traversability assessment of off-road robotic vehicles , 2016, Robotica.

[28]  Martial Hebert,et al.  Co-inference for Multi-modal Scene Analysis , 2012, ECCV.

[29]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[30]  Giulio Reina,et al.  A Self‐learning Framework for Statistical Ground Classification using Radar and Monocular Vision , 2015, J. Field Robotics.

[31]  Paul Newman,et al.  A generative framework for fast urban labeling using spatial and temporal context , 2009, Auton. Robots.

[32]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  R. Jørgensen,et al.  Multi-Modal Obstacle Detection and Evaluation of Occupancy Grid Mapping in Agriculture , 2016 .

[34]  Mikkel Kragh Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles , 2018 .

[35]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Florentin Wörgötter,et al.  Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Teresa A. Vidal-Calleja,et al.  Selective Combination of Visual and Thermal Imaging for Resilient Localization in Adverse Conditions: Day and Night, Smoke and Fire , 2013, J. Field Robotics.

[38]  Jana Kosecka,et al.  Recursive Inference for Prediction of Objects in Urban Environments , 2013, ISRR.

[39]  Giulio Reina,et al.  Visual ground segmentation by radar supervision , 2014, Robotics Auton. Syst..

[40]  Liang Xiao,et al.  CRF based road detection with multi-sensor fusion , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Avideh Zakhor,et al.  Sensor fusion for semantic segmentation of urban scenes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Ian D. Reid,et al.  gSLICr: SLIC superpixels at over 250Hz , 2015, ArXiv.

[45]  Mikkel Kragh Hansen,et al.  Object Detection and Terrain Classification in Agricultural Fields Using 3D Lidar Data , 2015, ICVS.

[46]  Paulo Peixoto,et al.  Multimodal vehicle detection: fusing 3D-LIDAR and color camera data , 2017, Pattern Recognit. Lett..

[47]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[48]  D. J. Hills,et al.  Autoguidance system operated at high speed causes almost no tomato damage , 2004 .

[49]  Sebastian Thrun,et al.  Automatic Online Calibration of Cameras and Lasers , 2013, Robotics: Science and Systems.

[50]  Lars Petersson,et al.  Multi-view terrain classification using panoramic imagery and LIDAR , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[52]  Thierry Peynot,et al.  Error modeling and calibration of exteroceptive sensors for accurate mapping applications , 2010, J. Field Robotics.

[53]  Martial Hebert,et al.  Natural terrain classification using three‐dimensional ladar data for ground robot mobility , 2006, J. Field Robotics.