Exploring Deep Learning Models for Overhead View Multiple Object Detection

The Internet of Things (IoT), with smart sensors, collects and generates big data streams for a wide range of applications. One of the important applications in this regard is video analytics which includes object detection. It has been considered as an important research area particularly after the development of deep neural networks. We demonstrate the applications, effectiveness, and efficiency of the convolutional neural network algorithms, i.e., Faster-RCNN and Mask-RCNN, to facilitate video analytics in the IoT domain, for overhead view multiple object detection and segmentation. We used the Faster-RCNN and Mask-RCNN models trained on the frontal view data set. To evaluate the performance of both algorithms, we used a newly recorded overhead view data set containing images of different objects having variation in field of view, background, illumination condition, poses, scales, sizes, angles, height, aspect ratio, and camera resolutions. Although the overhead view appearance of an object is significantly different as compared to a frontal view, even then the experimental results show the potential of the deep learning models by achieving the promising results. For Faster-RCNN, we achieved a true-positive rate (TPR) of 94% with a false-positive rate (FPR) of 0.4% for the overhead view images of persons, while for other objects the maximum obtained TPR is 92%. The Mask-RCNN model produced TPR of 93% with FPR of 0.5% for person images and maximum TPR of 92% for other objects. Furthermore, the detailed discussion is made on output results which highlights the challenges and possible future directions.

[1]  Imran Ahmed,et al.  Energy Efficient Camera Solution for Video Surveillance , 2019, International Journal of Advanced Computer Science and Applications.

[2]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[3]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[4]  Shengke Wang,et al.  Learning spatiotemporal representations for human fall detection in surveillance video , 2019, J. Vis. Commun. Image Represent..

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  B. Michaelis,et al.  Facial expression recognition based on Haar-like feature detection , 2008, Pattern Recognition and Image Analysis.

[13]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[15]  Gwanggil Jeon,et al.  Efficient topview person detector using point based transformation and lookup table , 2019, Comput. Commun..

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[18]  Imran Ahmed,et al.  Person detector for different overhead views using machine learning , 2019, Int. J. Mach. Learn. Cybern..

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Imran Ahmed,et al.  Person Detection from Overhead View: A Survey , 2019, International Journal of Advanced Computer Science and Applications.

[21]  Arun Kumar Sangaiah,et al.  A Robust Features-Based Person Tracker for Overhead Views in Industrial Environment , 2018, IEEE Internet of Things Journal.

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[23]  Imran Ahmed,et al.  A robust algorithm for detecting people in overhead views , 2017, Cluster Computing.

[24]  Luc Van Gool,et al.  Boosting Object Proposals: From Pascal to COCO , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.