DeepBEV: A Conditional Adversarial Network for Bird's Eye View Generation

Obtaining a meaningful, interpretable yet compact representation of the immediate surroundings of an autonomous vehicle is paramount for effective operation as well as safety. This paper proposes a solution to this by representing semantically important objects from a top-down, ego-centric bird's eye view. The novelty in this work is from formulating this problem as an adversarial learning task, tasking a generator model to produce bird's eye view representations which are plausible enough to be mistaken as a ground truth sample. This is achieved by using a Wasserstein Generative Adversarial Network based model conditioned on object detections from monocular RGB images and the corresponding bounding boxes. Extensive experiments show our model is more robust to novel data compared to strictly supervised benchmark models, while being a fraction of the size of the next best.

[1]  Trevor Darrell,et al.  Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[7]  Keshav Bimbraw,et al.  Autonomous cars: Past, present and future a review of the developments in the last century, the present scenario and the expected future of autonomous vehicle technology , 2015, 2015 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO).

[8]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jose M. Armingol,et al.  Traffic scene awareness for intelligent vehicles using ConvNets and stereo vision , 2019, Robotics Auton. Syst..

[14]  Xinge Zhu,et al.  Generative Adversarial Frontal View to Bird View Synthesis , 2018, 2018 International Conference on 3D Vision (3DV).

[15]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[16]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[17]  Andrea Palazzi,et al.  Learning to Map Vehicles into Bird's Eye View , 2017, ICIAP.

[18]  Ming-Shi Wang,et al.  A Vision Based Top-View Transformation Model for a Vehicle Parking Assistant , 2012, Sensors.

[19]  Naila Murray,et al.  Virtual KITTI 2 , 2020, ArXiv.

[20]  Yucheng Liu,et al.  A Surround View Camera Solution for Embedded Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Luc Van Gool,et al.  End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.