Towards View-Invariant Intersection Recognition from Videos using Deep Network Ensembles

This paper strives to answer the following question: Is it possible to recognize an intersection when seen from different road segments that constitute the intersection? An intersection or a junction typically is a meeting point of three or four road segments. Its recognition from a road segment that is transverse to or 180 degrees apart from its previous sighting is an extremely challenging and yet a very relevant problem to be addressed from the point of view of both autonomous driving as well as loop detection. This paper formulates this as a problem of video recognition and proposes a novel LSTM based Siamese style deep network for video recognition. For what is indeed a challenging problem and the limited annotated dataset available we show competitive results of recognizing intersections when approached from diverse viewpoints or road segments. Specifically, we tabulate effective recognition accuracy even as the approaches to the intersection being compared are disparate both in terms of viewpoints and weather/illumination conditions. We show competitive results on both synthetic yet highly realistic data mined from the gaming platform GTA as well as on real world data made available through Mapillary.

[1]  Gordon Wyeth,et al.  OpenFABMAP: An open source toolbox for appearance-based loop closure detection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[4]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[5]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[6]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[7]  K. Madhava Krishna,et al.  Have i reached the intersection: A deep learning-based approach for intersection detection from monocular cameras , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Qingquan Li,et al.  VeloRegistration based intersection detection for autonomous driving in challenging urban scenarios , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[11]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[12]  Paul Newman,et al.  Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[13]  K. Madhava Krishna,et al.  Visual localization in highly crowded urban environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[15]  Shuzhi Sam Ge,et al.  Intersection detection and recognition for autonomous urban driving using a virtual cylindrical scanner , 2014 .

[16]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[23]  K. Madhava Krishna,et al.  Large scale visual localization in urban environments , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Abel Gawel,et al.  X-View: Graph-Based Semantic Multiview Localization , 2017, IEEE Robotics and Automation Letters.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[28]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).