An object counting network based on hierarchical context and feature fusion

Abstract Object counting is a challenging task in computer vision. In this paper, we propose an object counting network based on hierarchical context and feature fusion called HFNet. HFNet comprises a hierarchical context extraction module and an end-to-end convolution neural network. The hierarchical context extraction module extracts hierarchical features to the main network as context cues, aiming to provide more information to improve counting performance. The main network adds the relatively lower but naturally high-resolution feature maps into higher but semantic feature maps, whose benefits are: one is to reduce the risk of losing detailed information during multi-convolutions; the other is to against the scale variations in this task due to the fusion operation of the multi-scale feature maps. Experiments demonstrate HFNet achieves competitive results on crowd counting including UCF_CC_50 dataset and ShanghaiTech dataset and on vehicle counting including TRANCOS dataset. The contrast experiments also verify the structure rationality of HFNet.

[1]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[2]  Bingbing Ni,et al.  Crowded Scene Analysis: A Survey , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Jae Sung Choi,et al.  A New Automated Cell Counting Program by Using Hough Transform-Based Double Edge , 2016, CSA/CUTE.

[5]  Changyin Sun,et al.  Crowd Counting via Weighted VLAD on a Dense Attribute Feature Map , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Saturnino Maldonado-Bascón,et al.  Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[7]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[8]  Jing Lv,et al.  Vehicle counting in crowded scenes with multi-channel and multi-task convolutional neural networks , 2017, J. Vis. Commun. Image Represent..

[9]  Robert B. Fisher,et al.  Detecting, Tracking and Counting Fish in Low Quality Unconstrained Underwater Videos , 2008, VISAPP.

[10]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew Zisserman,et al.  Counting in the Wild , 2016, ECCV.

[12]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Greg Mori,et al.  Detecting Pedestrians by Learning Shapelet Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[16]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[18]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[19]  Yu-Chee Tseng,et al.  A Survey of Intelligent Video Surveillance Systems: History, Applications and Future , 2014, International Conference on Supercomputing.

[20]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[21]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[22]  Vishal M. Patel,et al.  A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation , 2017, Pattern Recognit. Lett..

[23]  Giovanni Guzmán,et al.  Object counting without conglomerate separation , 2003, Proceedings of the Fourth Mexican International Conference on Computer Science, 2003. ENC 2003..

[24]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zeyad Q. H. Al-Zaydi,et al.  An adaptive people counting system with dynamic features selection and occlusion handling , 2016, J. Vis. Commun. Image Represent..

[27]  Yan Wang,et al.  Dense crowd counting from still images with convolutional neural networks , 2016, J. Vis. Commun. Image Represent..

[28]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.