Semi-Automatic Multi-Object Video Annotation Based on Tracking, Prediction and Semantic Segmentation

Instrumented and autonomous vehicles can generate very high volumes of video data per car per day all of which must be annotated at a high degree of granularity, detail, and accuracy. Manually or automatically annotating videos at this level and volume is not a trivial task. Manual annotation is slow and expensive while automatic annotation algorithms have shown significant improvement over the past few years. This demonstration presents an application of multi-object tracking, path prediction, and semantic segmentation approaches to facilitate the process of multi-object video annotation for enriched tracklet extraction. Currently, these three approaches are used to enhance the annotation task but more can and will be included in the future.

[1]  A. J. Lacey,et al.  Tutorial: The Likelihood Interpretation of the Kalman Filter. , 1996 .

[2]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[3]  Ramsey Michael Faragher,et al.  Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation [Lecture Notes] , 2012, IEEE Signal Processing Magazine.

[4]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[5]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yassine Ruichek,et al.  Survey on semantic segmentation using deep learning techniques , 2019, Neurocomputing.

[8]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[10]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[11]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[12]  Dizan Vasquez,et al.  A survey on motion prediction and risk assessment for intelligent vehicles , 2014, ROBOMECH Journal.

[13]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..