A 3D Dynamic Scene Analysis Framework for Development of Intelligent Transportation Systems

Holistic driving scene understanding is a critical step toward intelligent transportation systems. It involves different levels of analysis, interpretation, reasoning and decision making. In this paper, we propose a 3D dynamic scene analysis framework as the first step toward driving scene understanding. Specifically, given a sequence of synchronized 2D and 3D sensory data, the framework systematically integrates different perception modules to obtain 3D position, orientation, velocity and category of traffic participants and the ego car in a reconstructed 3D semantically labeled traffic scene. We implement this framework and demonstrate the effectiveness in challenging urban driving scenarios. The proposed framework builds a foundation for higher level driving scene understanding problems such as intention and motion prediction of surrounding entities, ego motion planning, and decision making.

[1]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[2]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Silvio Savarese,et al.  Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera , 2010, ECCV.

[4]  Sanjit A. Seshia,et al.  Safe Autonomy Under Perception Uncertainty Using Chance-Constrained Temporal Logic , 2017, Journal of Automated Reasoning.

[5]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ji Zhang,et al.  LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[7]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bernt Schiele,et al.  Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Dieter Fox,et al.  DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks , 2017, Robotics: Science and Systems.

[12]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Alexiadis,et al.  The next generation simulation program , 2004 .

[14]  Shiyu Song,et al.  Joint SFM and detection cues for monocular 3D localization in road scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Urbano Nunes,et al.  A Minimal Solution for the Extrinsic Calibration of a Camera and a Laser-Rangefinder , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[20]  Bastian Leibe,et al.  Combined image- and world-space tracking in traffic scenes , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Silvio Savarese,et al.  Combining 3D Shape, Color, and Motion for Robust Anytime Tracking , 2014, Robotics: Science and Systems.

[22]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization, Mapping and Moving Object Tracking , 2007, Int. J. Robotics Res..

[23]  Robert Pless,et al.  Extrinsic calibration of a camera and laser range finder (improves camera calibration) , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Marc Pollefeys,et al.  Dense Semantic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  J. Underwood,et al.  An evaluation of dynamic object tracking with 3D LIDAR , 2011 .

[27]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sanja Fidler,et al.  Holistic 3D scene understanding from a single geo-tagged image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bodo Rosenhahn,et al.  108 15081 – Holistic Scene Understanding Participants , 2015 .

[31]  Marc Pollefeys,et al.  Robust Dense Mapping for Large-Scale Dynamic Environments , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).