论文信息 - The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research

The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research

Video image datasets are playing an essential role in design and evaluation of traffic vision algorithms. Nevertheless, a longstanding inconvenience concerning image datasets is that manually collecting and annotating large-scale diversified datasets from real scenes is time-consuming and prone to error. For that virtual datasets have begun to function as a proxy of real datasets. In this paper, we propose to construct large-scale artificial scenes for traffic vision research and generate a new virtual dataset called "ParallelEye". First of all, the street map data is used to build 3D scene model of Zhongguancun Area, Beijing. Then, the computer graphics, virtual reality, and rule modeling technologies are utilized to synthesize large-scale, realistic virtual urban traffic scenes, in which the fidelity and geography match the real world well. Furthermore, the Unity3D platform is used to render the artificial scenes and generate accurate ground-truth labels, e.g., semantic/instance segmentation, object bounding box, object tracking, optical flow, and depth. The environmental conditions in artificial scenes can be controlled completely. As a result, we present a viable implementation pipeline for constructing large-scale artificial scenes for traffic vision research. The experimental results demonstrate that this pipeline is able to generate photorealistic virtual datasets with low modeling time and high accuracy labeling.

[1] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Nanning Zheng,et al. Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives , 2017, Artificial Intelligence Review.

[3] Helmut Prendinger,et al. Tokyo Virtual Living Lab: Designing Smart Cities Based on the 3D Internet , 2013, IEEE Internet Computing.

[4] Brendan McCane,et al. On Benchmarking Optical Flow , 2001, Comput. Vis. Image Underst..

[5] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[6] Antonio M. López,et al. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Li Li,et al. Steps toward Parallel Intelligence , 2016 .

[8] Christopher M. Bishop,et al. A New Framework for Machine Learning , 2008, WCCI.

[9] Ming-Ting Sun,et al. Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] W. Bainbridge. The Scientific Research Potential of Virtual Worlds , 2007, Science.

[11] David Vázquez,et al. Learning appearance in virtual scenarios for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[13] Andrew J. Davison,et al. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14] Wang Fei-Yue,et al. Parallel Control: A Method for Data-Driven and Computational Control , 2013 .

[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16] Richard Szeliski,et al. A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17] Fei-Yue Wang,et al. Parallel Control and Management for Intelligent Transportation Systems: Concepts, Architectures, and Applications , 2010, IEEE Transactions on Intelligent Transportation Systems.

[18] James M. Rehg,et al. Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[19] David J. Fleet,et al. Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[20] Roberto Cipolla,et al. Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[21] Qiao Wang,et al. VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[23] Roberto Cipolla,et al. Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[24] Vladlen Koltun,et al. Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[25] Sebastian Ramos,et al. Vision-Based Offline-Online Perception Paradigm for Autonomous Driving , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[26] Markus Schoeler,et al. Semantic Pose Using Deep Networks Trained on Synthetic RGB-D , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[28] Wang Fei-Yue,et al. Parallel imaging: A unified theoretical framework for image generation , 2017, 2017 Chinese Automation Congress (CAC).