CNN based Multi-View Object Detection and Association

We present a system capable of detecting and associating traffic participants in camera images with different points of view to a street scene. Our system is based on a multitask CNN architecture and detection as well as association are performed within the network. The association between different images is estimated without explicit knowledge of scene geometry and camera calibration information.One of the main applications of our system are currently test areas for autonomous vehicles. For this use case ground truth information for testing the environment perception as well as external environment information sent to autonomous vehicles via car-to-infrastructure is of great importance. This is particularly interesting in complex scenarios like big intersections. Our system shows promising results on the association task of big intersections taken from 8 different points of view.

[1]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Alberto L. Sangiovanni-Vincentelli,et al.  A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving , 2018, ICMR.

[4]  Ralf Kohlhaas,et al.  Towards Large Scale Urban Traffic Reference Data: Smart Infrastructure in the Test Area Autonomous Driving Baden-Württemberg , 2018, IAS.

[5]  Klaus C. J. Dietmayer,et al.  The Ko-PER intersection laserscanner and video dataset , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Vijay Kumar,et al.  Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model , 2017, IEEE Robotics and Automation Letters.

[8]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[11]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Johann Marius Zöllner,et al.  DeepTLR: A single deep convolutional network for detection and classification of traffic lights , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[15]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[16]  Johann Marius Zöllner,et al.  Automated Focal Loss for Image based Object Detection , 2019, 2020 IEEE Intelligent Vehicles Symposium (IV).

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[20]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[21]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.