DPDnet: A robust people detector using deep learning with an overhead depth camera

Abstract This paper proposes a deep learning-based method to detect multiple people from a single overhead depth image with high precision. Our neural network, called DPDnet , is composed by two fully-convolutional encoder-decoder blocks built with residual layers. The main block takes a depth image as input and generates a pixel-wise confidence map, where each detected person in the image is represented by a Gaussian-like distribution, The refinement block combines the depth image and the output from the main block, to refine the confidence map. Both blocks are simultaneously trained end-to-end using depth images and ground truth head position labels. The paper provides a rigorous experimental comparison with some of the best methods of the state-of-the-art, being exhaustively evaluated in different publicly available datasets. DPDnet proves to outperform all the evaluated methods with statistically significant differences, and with accuracies that exceed 99%. The system was trained on one of the datasets (generated by the authors and available to the scientific community) and evaluated in the others without retraining, proving also to achieve high accuracy with varying datasets and experimental conditions. Additionally, we made a comparison of our proposal with other CNN-based alternatives that have been very recently proposed in the literature, obtaining again very high performance. Finally, the computational complexity of our proposal is shown to be independent of the number of users in the scene and runs in real time using conventional GPUs.

[1]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  SuGil Choi,et al.  A method for counting moving and stationary people by interest point classification , 2013, 2013 IEEE International Conference on Image Processing.

[3]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[4]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[5]  Joaquín Salas,et al.  Counting pedestrians with a zenithal arrangement of depth cameras , 2015, Machine Vision and Applications.

[6]  Majid Mirmehdi,et al.  Detecting humans in RGB-D data with CNNs , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Kin Hong Wong,et al.  Human Tracking and Counting Using the KINECT Range Sensor Based on Adaboost and Kalman Filter , 2013, ISVC.

[9]  Meng Sun,et al.  Detection of People With Camouflage Pattern Via Dense Deconvolution Network , 2019, IEEE Signal Processing Letters.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sung-Jea Ko,et al.  Robust people counting system based on sensor fusion , 2012, IEEE Transactions on Consumer Electronics.

[12]  Emanuele Frontoni,et al.  Convolutional Networks for Semantic Heads Segmentation using Top-View Depth Data in Crowded Environment , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[13]  Frantisek Galcík,et al.  Real-Time Depth Map Based People Counting , 2013, ACIVS.

[14]  Yi Zheng,et al.  Parallel RCNN: A deep learning method for people detection using RGB-D images , 2017, 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[15]  Ke Zhang,et al.  Counting people in crowded scenes by video analyzing , 2014, 2014 9th IEEE Conference on Industrial Electronics and Applications.

[16]  Wanqing Li,et al.  Human detection from images and videos: A survey , 2016, Pattern Recognit..

[17]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Mario Vento,et al.  Counting people by RGB or depth overhead cameras , 2016, Pattern Recognit. Lett..

[20]  Luigi di Stefano,et al.  People Tracking Using a Time-of-Flight Depth Sensor , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[21]  Jean-Marc Odobez,et al.  WatchNet: Efficient and Depth-based Network for People Detection in Video Surveillance Systems , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[22]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Gianpaolo Francesco Trotta,et al.  Computer vision and deep learning techniques for pedestrian detection and tracking: A survey , 2018, Neurocomputing.

[24]  Jean-Marc Odobez,et al.  UNICITY: A depth maps database for people detection in security airlocks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[25]  Yan Qiu Chen,et al.  Real-time human detection with depth camera via a physical radius-depth detector and a CNN descriptor , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[26]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Heng Tao Shen,et al.  Exploiting Depth From Single Monocular Images for Object Detection and Semantic Segmentation , 2016, IEEE Transactions on Image Processing.

[29]  Dubravko Culibrk,et al.  K-means based segmentation for real-time zenithal people counting , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Anton Kummert,et al.  People Detection and Tracking from a Top-View Position Using a Time-of-Flight Camera , 2013, MCSS.

[32]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Matteo Munaro,et al.  RGB-D Human Detection and Tracking for Industrial Environments , 2014, IAS.

[35]  Tsong-Yi Chen,et al.  A People Counting System Based on Face-Detection , 2010, 2010 Fourth International Conference on Genetic and Evolutionary Computing.

[36]  Junjie Yan,et al.  Water Filling: Unsupervised People Counting via Vertical Kinect Sensor , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[37]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Alvaro Fernandez-Rincon,et al.  Robust People Detection and Tracking from an Overhead Time-of-Flight Camera , 2017, VISIGRAPP.

[39]  Michael Rauter Reliable Human Detection and Tracking in Top-View Depth Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Alvaro Fernandez-Rincon,et al.  Robust people detection using depth information from an overhead Time-of-Flight camera , 2017, Expert Syst. Appl..

[41]  Anton Kummert,et al.  Applications for a people detection and tracking algorithm using a time-of-flight camera , 2014, Multimedia Tools and Applications.

[42]  Pei Li,et al.  People counting based on head detection combining Adaboost and CNN in crowded surveillance environment , 2016, Neurocomputing.

[43]  Bernt Schiele,et al.  Towards Reaching Human Performance in Pedestrian Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Ye Liu,et al.  Detecting and tracking people in real time with RGB-D camera , 2015, Pattern Recognit. Lett..

[46]  Jungwon Lee,et al.  Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[47]  Chao Wang,et al.  Multi-Layer Proposal Network for People Counting in Crowded Scene , 2017, 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA).

[48]  Li Jia,et al.  Using Time-of-Flight Measurements for Privacy-Preserving Tracking in a Smart Room , 2014, IEEE Transactions on Industrial Informatics.

[49]  Hariharan Ravishankar,et al.  Learning and Incorporating Shape Models for Semantic Segmentation , 2017, MICCAI.

[50]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[51]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.