Deep Detection of People and their Mobility Aids for a Hospital Robot

Robots operating in populated environments encounter many different types of people, some of whom might have an advanced need for cautious interaction, because of physical impairments or their advanced age. Robots therefore need to recognize such advanced demands to provide appropriate assistance, guidance or other forms of support. In this paper, we propose a depth-based perception pipeline that estimates the position and velocity of people in the environment and categorizes them according to the mobility aids they use: pedestrian, person in wheelchair, person in a wheelchair with a person pushing them, person with crutches and person using a walker. We present a fast region proposal method that feeds a Region-based Convolutional Network (Fast R- CNN [1]). With this, we speed up the object detection process by a factor of seven compared to a dense sliding window approach. We furthermore propose a probabilistic position, velocity and class estimator to smooth the CNN's detections and account for occlusions and misclassifications. In addition, we introduce a new hospital dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm that our pipeline successfully keeps track of people and their mobility aids, even in challenging situations with multiple people from different categories and frequent occlusions.

[1]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[2]  Kai Oliver Arras,et al.  On multi-modal people tracking from mobile platforms in very crowded and dynamic environments , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Matteo Munaro,et al.  Fast RGB-D people tracking for service robots , 2014, Auton. Robots.

[4]  Lucas Beyer,et al.  DROW: Real-Time Deep Learning-Based Wheelchair Detection in 2-D Range Data , 2016, IEEE Robotics and Automation Letters.

[5]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[6]  Roland Siegwart,et al.  Multimodal detection and tracking of pedestrians in urban environments with explicit ground plane extraction , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Kai Oliver Arras,et al.  Real-time full-body human gender recognition in (RGB)-D data , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Rainer Stiefelhagen,et al.  Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment , 2006 .

[9]  Bastian Leibe,et al.  Person Attribute Recognition with a Jointly-Trained Holistic CNN Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[10]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Wolfram Burgard,et al.  Choosing smartly: Adaptive multimodal fusion for object detection in changing environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Gaurav Sharma,et al.  Learning discriminative spatial representation for image classification , 2011, BMVC.

[18]  Bastian Leibe,et al.  Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Bastian Leibe,et al.  Efficient Use of Geometric Constraints for Sliding-Window Object Detection in Video , 2011, ICVS.

[20]  Silvio Savarese,et al.  Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[21]  Marc Hanheide,et al.  Real-time multisensor people tracking for human-robot spatial interaction , 2015 .