Multi-model Traffic Scene Simulation with Road Image Sequences and GIS Information

In this paper, a new multi-modal traffic scene simulation framework with combined inputs of road image sequences and road information from Geographic Information Systems (GIS) is proposed. The proposed framework contains two major steps, with the first one being a preprocessing step, including 3D road model extraction, camera location and orientation estimation and lane extraction from both GIS and road image sequences. After such preprocessing, the traffic scene reconstruction is reformulated into a 6-degree of freedom (6DoF) pose estimation in the 3D road model. Subsequently, the iterative closest point (ICP) algorithm is exploited for coarse point registration by estimating the pose in the road model. In addition, an objective function is established to incorporate the image features (e.g., lanes) into the road model and to refine the pose estimation. In the experiments with the publicly available KITTI dataset, the proposed method achieves high average Intersection-over-Union (IoU) scores as compared to the ground truth image sequences.

[1]  Gang Hua,et al.  Auxiliary Training Information Assisted Visual Recognition , 2015, IPSJ Trans. Comput. Vis. Appl..

[2]  Jie Huang,et al.  Video-based Sign Language Recognition without Temporal Segmentation , 2018, AAAI.

[3]  Gang Hua,et al.  Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition , 2018, AIAI.

[4]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[6]  Gang Hua,et al.  Multi-View Visual Recognition of Imperfect Testing Data , 2015, ACM Multimedia.

[7]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[8]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[9]  Chi Zhang,et al.  RoadView: A traffic scene simulator for autonomous vehicle simulation testing , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[10]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[11]  Luc Van Gool,et al.  3D Urban Scene Modeling Integrating Recognition and Reconstruction , 2008, International Journal of Computer Vision.

[12]  Jian Li,et al.  Fast implementation of sparse iterative covariance-based estimation for source localization. , 2012, The Journal of the Acoustical Society of America.

[13]  Nanning Zheng,et al.  The “floor-wall” traffic scenes construction for unmanned vehicle simulation evaluation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[14]  Richard Szeliski,et al.  Reconstructing Rome , 2010, Computer.

[15]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Fred W. Glover,et al.  Scatter Search and Local Nlp Solvers: A Multistart Framework for Global Optimization , 2006, INFORMS J. Comput..

[17]  Wei Wei,et al.  A Hyperspectral Image Classification Framework with Spatial Pixel Pair Features , 2017, Sensors.

[18]  Gang Hua,et al.  Can Visual Recognition Benefit from Auxiliary Information in Training? , 2014, ACCV.

[19]  Daniel G. Aliaga,et al.  A Survey of Urban Reconstruction , 2013, Comput. Graph. Forum.

[20]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[21]  Yanning Zhang,et al.  Convolutional Neural Network-Based Robot Navigation Using Uncalibrated Spherical Images , 2017, Sensors.

[22]  Jorge Nocedal,et al.  An interior algorithm for nonlinear optimization that combines line search and trust region steps , 2006, Math. Program..

[23]  S. Drake Converting GPS Coordinates (phil lambdal h) to Navigation Coordination (ENU) , 2002 .

[24]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Nanning Zheng,et al.  Three-Dimensional Traffic Scenes Simulation From Road Image Sequences , 2016, IEEE Transactions on Intelligent Transportation Systems.

[26]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[27]  Jian Li,et al.  Fast implementation of sparse iterative covariance-based estimation for array processing , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[28]  Jian Li,et al.  Iterative Sparse Asymptotic Minimum Variance Based Approaches for Array Processing , 2013, IEEE Transactions on Signal Processing.