Crowd Counting with Semantic Scene Segmentation in Helicopter Footage

Continually improving crowd counting neural networks have been developed in recent years. The accuracy of these networks has reached such high levels that further improvement is becoming very difficult. However, this high accuracy lacks deeper semantic information, such as social roles (e.g., student, company worker, or police officer) or location-based roles (e.g., pedestrian, tenant, or construction worker). Some of these can be learned from the same set of features as the human nature of an entity, whereas others require wider contextual information from the human surroundings. The primary end-goal of developing recognition software is to involve them in autonomous decision-making systems. Therefore, it must be foolproof, which is, it must have good semantic understanding of the input. In this study, we focus on counting pedestrians in helicopter footage and introduce a dataset created from helicopter videos for this purpose. We use semantic segmentation to extract the required additional contextual information from the surroundings of an entity. We demonstrate that it is possible to increase the pedestrian counting accuracy in this manner. Furthermore, we show that crowd counting and semantic segmentation can be simultaneously achieved, with comparable or even improved accuracy, by using the same crowd counting neural network for both tasks through hard parameter sharing. The presented method is generic and it can be applied to arbitrary crowd density estimation methods. A link to the dataset is available at the end of the paper.

[1]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[2]  Pan Zhou,et al.  DA-Net: Learning the Fine-Grained Density Distribution With Deformation Aggregation Network , 2018, IEEE Access.

[3]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Peter Reinartz,et al.  MRCNet: Crowd Counting and Density Map Estimation in Aerial and Ground Imagery , 2019, ArXiv.

[5]  Liang Lin,et al.  Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[6]  Baoyuan Wu,et al.  Residual Regression With Semantic Prior for Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Qi Wang,et al.  PCC Net: Perspective Crowd Counting via Spatial Convolutional Network , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Qinghua Hu,et al.  Vision Meets Drones: A Challenge , 2018, ArXiv.

[9]  Xiang Chen,et al.  DeepCount: Crowd Counting with WiFi via Deep Learning , 2019, ArXiv.

[10]  Michael Ying Yang,et al.  UAVid: A semantic segmentation dataset for UAV imagery , 2018 .

[11]  Xiaobo Lu,et al.  Mask-Aware Networks for Crowd Counting , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jamie Sherrah,et al.  Effective semantic pixel labelling with convolutional networks and Conditional Random Fields , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[17]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[18]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Fei Su,et al.  Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[20]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Hao Lu,et al.  From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Gang Fu,et al.  Classification for High Resolution Remote Sensing Imagery Using a Fully Convolutional Network , 2017, Remote. Sens..

[24]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Yoshua Bengio,et al.  Count-ception: Counting by Fully Convolutional Redundant Counting , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Meng Wang,et al.  DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting , 2019, ACM Multimedia.

[30]  Noel E. O'Connor,et al.  ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[31]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Ling Shao,et al.  Relational Attention Network for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Vishal M. Patel,et al.  Inverse Attention Guided Deep Crowd Counting Network , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[35]  Cees Snoek,et al.  Counting With Focus for Free , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Yoshihide Sekimoto,et al.  Congestion Analysis of Convolutional Neural Network-Based Pedestrian Counting Methods on Helicopter Footage , 2019, ArXiv.

[37]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[38]  Pierre Alliez,et al.  High-Resolution Aerial Image Labeling With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[39]  Samuel Murray,et al.  Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Vishal M. Patel,et al.  Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  R. Venkatesh Babu,et al.  Top-Down Feedback for Crowd Counting Convolutional Neural Network , 2018, AAAI.

[42]  Changxin Gao,et al.  Scale Pyramid Network for Crowd Counting , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43]  Uwe Stilla,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016 .

[44]  Ming Liu,et al.  Crowd Counting with Fully Convolutional Neural Network , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[45]  Wangmeng Zuo,et al.  Perspective-Guided Convolution Networks for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Pascal Fua,et al.  Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Han Zou,et al.  Device-free occupancy detection and crowd counting in smart buildings with WiFi-enabled IoT , 2018, Energy and Buildings.

[48]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.