DROW: Real-Time Deep Learning-Based Wheelchair Detection in 2-D Range Data

We introduce the DROW detector, a deep learning-based object detector operating on 2-dimensional (2-D) range data. Laser scanners are lighting invariant, provide accurate 2-D range data, and typically cover a large field of view, making them interesting sensors for robotics applications. So far, research on detection in laser 2-D range data has been dominated by hand-crafted features and boosted classifiers, potentially losing performance due to suboptimal design choices. We propose a convolutional neural network (CNN) based detector for this task. We show how to effectively apply CNNs for detection in 2-D range data, and propose a depth preprocessing step and a voting scheme that significantly improve CNN performance. We demonstrate our approach on wheelchairs and walkers, obtaining state of the art detection results. Apart from the training data, none of our design choices limits the detector to these two classes, though. We provide a ROS node for our detector and release our dataset containing 464 k laser scans, out of which 24 k were annotated.

[1]  Evangeline Pollard,et al.  2D laser based road obstacle classification for road safety improvement , 2015, 2015 IEEE International Workshop on Advanced Robotics and its Social Impacts (ARSO).

[2]  Marc Hanheide,et al.  Real-time multisensor people tracking for human-robot spatial interaction , 2015 .

[3]  Horst Bischof,et al.  Hough Networks for Head Pose Estimation and Facial Feature Localization , 2014, BMVC.

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[6]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[7]  Ingmar Posner,et al.  End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks , 2016, ArXiv.

[8]  Mohan M. Trivedi,et al.  Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis , 2013, IEEE Transactions on Intelligent Transportation Systems.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Ryo Kurazume,et al.  Multi-Part People Detection Using 2D Range Data , 2010, Int. J. Soc. Robotics.

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[13]  Roland Siegwart,et al.  A Layered Approach to People Detection in 3D Range Data , 2010, AAAI.

[14]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[15]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[16]  Ingmar Posner,et al.  Voting for Voting in Online Point Cloud Object Detection , 2015, Robotics: Science and Systems.

[17]  Pau-Choo Chung,et al.  Wheelchair Detection Using Cascaded Decision Tree , 2010, IEEE Transactions on Information Technology in Biomedicine.

[18]  Cyril Cauchois,et al.  Robotic assistance: an automatic wheelchair tracking and following functionality by omnidirectional vision , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20]  Seyed-Ahmad Ahmadi,et al.  Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound , 2016, Comput. Vis. Image Underst..

[21]  António E. Ruano,et al.  Fast Line, Arc/Circle and Leg Detection from Laser Scan Data in a Player Driver , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[22]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Lucas Beyer,et al.  Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels , 2015, GCPR.

[26]  Horst-Michael Groß,et al.  People detection and distinction of their walking aids in 2D laser range data based on generic distance-invariant features , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[27]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[28]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[30]  Hai Su,et al.  Deep Voting: A Robust Approach Toward Nucleus Localization in Microscopy Images , 2015, MICCAI.

[31]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Roland Siegwart,et al.  Human detection using multimodal and multidimensional features , 2008, 2008 IEEE International Conference on Robotics and Automation.

[33]  Peter Kulchyski and , 2015 .

[34]  Mubarak Shah,et al.  Wheelchair Detection in a Calibrated Environment , 2002 .

[35]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36]  Wolfram Burgard,et al.  Using Boosted Features for the Detection of People in 2D Range Data , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.