A Deep Neural Network Approach for Top View People Detection and Counting

People detection and counting is considered as one of the important application in video surveillance. Various computer vision and deep learning based methods have been developed which aim to provide efficient and accurate people detection/counting results using frontal view data sets. Furthermore, there are many challenges which occurs while detecting people including occlusion, perspective distortion, variations in human body pose, size and orientation, these challenges effect the results of developed detection and counting models. In this work, a deep neural network approach i.e. SSD (Single Shot multi-box Detector) is explored for people detection and counting. SSD model is used for detection and counting of people from significantly different viewpoint i.e. top view. To the extent of our knowledge, this is the first attempt to use deep neural network based model for top view people detection and counting. Furthermore, the impact of frontal view trained SSD model on top view test images is also discussed. The experimental results show the effectiveness of deep learning model by achieving promising results with average TPR of 95% and TPR 94.42% for indoor and outdoor environments respectively.

[1]  Antoni B. Chan,et al.  Crossing the Line: Crowd Counting by Integer Programming with Local Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Junjie Yan,et al.  Water Filling: Unsupervised People Counting via Vertical Kinect Sensor , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[4]  Huadong Ma,et al.  Real-time accurate crowd counting based on RGB-D information , 2012, 2012 19th IEEE International Conference on Image Processing.

[5]  Arun Kumar Sangaiah,et al.  A Robust Features-Based Person Tracker for Overhead Views in Industrial Environment , 2018, IEEE Internet of Things Journal.

[6]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[7]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bing-Fei Wu,et al.  The design and implementation of a vision-based people counting system in buses , 2016, 2016 International Conference on System Science and Engineering (ICSSE).

[9]  Gwanggil Jeon,et al.  Efficient topview person detector using point based transformation and lookup table , 2019, Comput. Commun..

[10]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[11]  Thomas B. Moeslund,et al.  Pedestrian Counting with Occlusion Handling Using Stereo Thermal Cameras , 2016, Sensors.

[12]  Yu Qiao,et al.  Depth driven people counting using deep region proposal network , 2017, 2017 IEEE International Conference on Information and Automation (ICIA).

[13]  Awais Adnan,et al.  Robust Background Subtraction Based Person’s Counting From Overhead View , 2018, 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).

[14]  Alessandro Corbetta,et al.  Fluctuations around mean walking behaviors in diluted pedestrian flows. , 2016, Physical review. E.

[15]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jianxin Li,et al.  Benchmark Data and Method for Real-Time People Counting in Cluttered Scenes Using Depth Sensors , 2018, IEEE Transactions on Intelligent Transportation Systems.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Imran Ahmed,et al.  Person detector for different overhead views using machine learning , 2019, Int. J. Mach. Learn. Cybern..

[20]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[23]  Alberto Del Bimbo,et al.  Real-time people counting from depth imagery of crowded environments , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Imran Ahmed,et al.  A robust algorithm for detecting people in overhead views , 2017, Cluster Computing.

[25]  Mario Vento,et al.  An efficient and effective method for people detection from top-view depth cameras , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[26]  Satarupa Mukherjee,et al.  Anovel framework for automatic passenger counting , 2011, 2011 18th IEEE International Conference on Image Processing.

[27]  Carlo Ratti,et al.  Kinects and human kinetics: a new approach for studying pedestrian behavior , 2014 .

[28]  Mario Vento,et al.  A versatile and effective method for counting people on either RGB or depth overhead cameras , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[29]  John N. Carter,et al.  A robust person detector for overhead views , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[30]  Ali Yeon Md Shakaff,et al.  A robust multimedia surveillance system for people counting , 2017, Multimedia Tools and Applications.

[31]  Takayuki Kanda,et al.  Person Tracking in Large Public Spaces Using 3-D Range Sensors , 2013, IEEE Transactions on Human-Machine Systems.

[32]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[33]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34]  Zhiqiang Li,et al.  Deep People Counting with Faster R-CNN and Correlation Tracking , 2016, ICIMCS.

[35]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[36]  Mario Vento,et al.  Counting people by RGB or depth overhead cameras , 2016, Pattern Recognit. Lett..

[37]  Ching-Tang Hsieh,et al.  A Kinect-based people-flow counting system , 2012, 2012 International Symposium on Intelligent Signal Processing and Communications Systems.

[38]  Imran Ahmed,et al.  Person Detection from Overhead View: A Survey , 2019, International Journal of Advanced Computer Science and Applications.

[39]  Imran Ahmed,et al.  Energy Efficient Camera Solution for Video Surveillance , 2019, International Journal of Advanced Computer Science and Applications.

[40]  Liangliang Sun,et al.  Counting people by using a single camera without calibration , 2016, 2016 Chinese Control and Decision Conference (CCDC).

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  B. Michaelis,et al.  Facial expression recognition based on Haar-like feature detection , 2008, Pattern Recognition and Image Analysis.