The Raincouver Scene Parsing Benchmark for Self-Driving in Adverse Weather and at Night

Self-driving vehicles have the potential to transform the way we travel. Their development is at a pivotal point, as a growing number of industrial and academic research organizations are bringing these technologies into controlled but real-world settings. An essential capability of a self-driving vehicle is environment understanding: Where are the pedestrians, the other vehicles, and the drivable space? In computer and robot vision, the task of identifying semantic categories at a per pixel level is known as scene parsing or semantic segmentation. While much progress has been made in scene parsing in recent years, current datasets for training and benchmarking scene parsing algorithms focus on nominal driving conditions: fair weather and mostly daytime lighting. To complement the standard benchmarks, we introduce the Raincouver scene parsing benchmark, which to our knowledge is the first scene parsing benchmark to focus on challenging rainy driving conditions, during the day, at dusk, and at night. Our dataset comprises half an hour of driving video captured on the roads of Vancouver, Canada, and 326 frames with hand-annotated pixelwise semantic labels.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[7]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[8]  Philip H. S. Torr,et al.  Automatic dense visual semantic mapping from street-level imagery , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  P. Torr,et al.  Urban 3 D Semantic Modelling Using Stereo Vision , 2013 .

[11]  Fabio Tozeto Ramos,et al.  Online self-supervised multi-instance segmentation of dynamic objects , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Stefan K. Gehrig,et al.  Exploiting the Power of Stereo Confidences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[17]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  James J. Little,et al.  Scene parsing by nonparametric label transfer of content-adaptive windows , 2016, Comput. Vis. Image Underst..

[19]  Ali Shahrokni,et al.  Urban 3D semantic modelling using stereo vision , 2013, 2013 IEEE International Conference on Robotics and Automation.

[20]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[24]  Lars Petersson,et al.  Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Inthat Boonpogmanee,et al.  Fully Convolutional Neural Networks for Semantic Segmentation of Polyp Images Taken During Colonoscopy , 2018, American Journal of Gastroenterology.

[26]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  James J. Little,et al.  MF3D: Model-free 3D semantic scene parsing , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Paul Newman,et al.  Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localisation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.