DHA: Lidar and Vision data Fusion-based On Road Object Classifier

In this paper, we first extract three different kinds of high-level features from LIDAR point cloud, and combine them into the DHA (Depth, Height and Angle) channels. Integrated with the traditional RGB image from camera, we build a rich feature-based road object classifier by training a deep convolutional neural network model with six-channel (RGBDHA) data. Subsequently, this deep convolution neural network is fed by the integration of spacial and RGB information. With additional upsampled LIDAR data, the classifier reaches higher accuracy than single RGB image base methods. Several simulations on the famous autonomous vehicle benchmark of KITTI show that our fusion-based classifier outperforms RGB-based approaches about 15% and reaches average accuracy of 96%.

[1]  Heinrich H. Bülthoff,et al.  Going into depth: Evaluating 2D and 3D cues for object classification on a new, large-scale object dataset , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Xinyu Zhang,et al.  Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition , 2017, Inf. Sci..

[3]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[5]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Daniel A. Keim,et al.  Methods for similarity search on 3D databases , 2002 .

[7]  Cristiano Premebida,et al.  Pedestrian detection combining RGB and dense LIDAR data , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Yu Liu,et al.  Multiwave: A novel vehicle steering pattern detection method based on smartphones , 2016, 2016 IEEE International Conference on Communications (ICC).

[9]  Jean-Michel Morel,et al.  Nonlocal Image and Movie Denoising , 2008, International Journal of Computer Vision.

[10]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[11]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Dipti Deodhare,et al.  Robust Direct Visual Odometry Estimation for a Monocular Camera Under Rotations , 2018, IEEE Robotics and Automation Letters.

[13]  Jiwen Lu,et al.  Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition , 2016, ArXiv.

[14]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[16]  Zsolt Kira,et al.  Fusing LIDAR and images for pedestrian detection using convolutional neural networks , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Javed Imran,et al.  Human action recognition using RGB-D sensor and deep convolutional neural networks , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Christoph Mertz,et al.  Pedestrian Detection and Tracking Using Three-dimensional LADAR Data , 2010, Int. J. Robotics Res..

[26]  L. McMillan,et al.  Video enhancement using per-pixel virtual exposures , 2005, SIGGRAPH 2005.

[27]  V. Antipov,et al.  Computer vision system for recognition and detection of color patterns in real-time task of robot control , 2017, 2017 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SINKHROINFO).