Improving the Efficiency of 3D Monocular Object Detection and Tracking for Road and Railway Smart Mobility

Three-dimensional (3D) real-time object detection and tracking is an important task in the case of autonomous vehicles and road and railway smart mobility, in order to allow them to analyze their environment for navigation and obstacle avoidance purposes. In this paper, we improve the efficiency of 3D monocular object detection by using dataset combination and knowledge distillation, and by creating a lightweight model. Firstly, we combine real and synthetic datasets to increase the diversity and richness of the training data. Then, we use knowledge distillation to transfer the knowledge from a large, pre-trained model to a smaller, lightweight model. Finally, we create a lightweight model by selecting the combinations of width, depth & resolution in order to reach a target complexity and computation time. Our experiments showed that using each method improves either the accuracy or the efficiency of our model with no significant drawbacks. Using all these approaches is especially useful for resource-constrained environments, such as self-driving cars and railway systems.

[1]  Chien-Yao Wang,et al.  You Only Learn One Representation: Unified Network for Multiple Tasks , 2021, J. Inf. Sci. Eng..

[2]  L. Li,et al.  YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications , 2022, ArXiv.

[3]  Dahua Lin,et al.  Monocular 3D Object Detection with Depth from Motion , 2022, ECCV.

[4]  H. Liao,et al.  YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xi Li,et al.  MonoGround: Detecting Monocular 3D Objects from the Ground , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  R. Khemmar,et al.  Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset , 2022, Sensors.

[7]  Winston H. Hsu,et al.  MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  R. Khemmar,et al.  Lightweight convolutional neural network for real-time 3D object detection in road and railway environments , 2022, Journal of Real-Time Image Processing.

[9]  L. Leal-Taixé,et al.  TrackFormer: Multi-Object Tracking with Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dingfu Zhou,et al.  AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Zeming Li,et al.  YOLOX: Exceeding YOLO Series in 2021 , 2021, ArXiv.

[12]  Hongzi Zhu,et al.  Monocular 3D Object Detection: An Extrinsic Parameter Free Approach , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jiwen Lu,et al.  Objects are Different: Flexible Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ming Liu,et al.  YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Ming Liu,et al.  Ground-Aware Monocular 3D Object Detection for Autonomous Driving , 2021, IEEE Robotics and Automation Letters.

[16]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[17]  B. Koonce EfficientNet , 2021, Convolutional Neural Networks with Swift for Tensorflow.

[18]  B. Koonce Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization , 2021 .

[19]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[20]  Roman Potarusov,et al.  FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains , 2020, ArXiv.

[21]  Huaici Zhao,et al.  RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving , 2020, ECCV.

[22]  S. Harrer,et al.  Ensemble Knowledge Distillation for Learning Improved and Efficient Networks , 2019, ECAI.

[23]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Seyed Iman Mirzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant , 2019, AAAI.

[25]  Oliver Zendel,et al.  RailSem19: A Dataset for Semantic Rail Scene Understanding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[27]  Philipp Krähenbühl,et al.  Free Supervision from Video Games , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[29]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[30]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[31]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[33]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[36]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[38]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[39]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[41]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.