A cross-modal fusion based approach with scale-aware deep representation for RGB-D crowd counting and density estimation

Abstract To simultaneously estimate the crowd count and the density map from the crowd images, this paper presents a novel cross-modal fusion based approach for RGB-D crowd counting. For the RGB-D crowd counting task, the depth data is often utilized into the procedure of detecting heads of the crowd to enhance the counting performance so as to reduce the underestimation from the small heads of the crowd. Different from the traditional methods utilizing depth image for the target task, the proposed approach is essentially designed as a density estimation-based regression framework to learn the more abundant deep representation from the original images through cross-modal interactions in multiple locations of the framework, which is more beneficial to crowd counting in various scenes, especially the congested scenes. Meanwhile, modeling the global and local contexts is designed to facilitate the proposed approach to learn the more adequate scale-aware representation for the counting task. Extensive experiments on MICC and large-scale ShanghaiRGBD benchmarks demonstrate that the performance of the proposed approach is superior to the state-of-the-art methods for RGB-D crowd counting and density estimation. Further, the proposed approach could be extended to RGB crowd counting task and the experimental results show that it achieves the comparable performance with the existing crowd counting methods.

[1]  Yu-Chee Tseng,et al.  A Survey of Intelligent Video Surveillance Systems: History, Applications and Future , 2014, International Conference on Supercomputing.

[2]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[3]  Hieu Le,et al.  Iterative Crowd Counting , 2018, ECCV.

[4]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shubhra Aich,et al.  Object Counting with Small Datasets of Large Images , 2018, ArXiv.

[6]  Liang Lin,et al.  Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[7]  Qijun Zhao,et al.  Point in, Box Out: Beyond Counting Persons in Crowds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Mark van der Meijde,et al.  Monitoring Soil Moisture Dynamics Using Electrical Resistivity Tomography under Homogeneous Field Conditions , 2020, Sensors.

[9]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Shenghua Gao,et al.  Density Map Regression Guided Detection Network for RGB-D Crowd Counting and Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shaogang Gong,et al.  Crowd Counting and Profiling: Methodology and Evaluation , 2013, Modeling, Simulation and Visual Analysis of Crowds.

[13]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Hao Li,et al.  DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Haidi Ibrahim,et al.  Recent survey on crowd density estimation and counting for visual surveillance , 2015, Eng. Appl. Artif. Intell..

[17]  Ezzeddine Zagrouba,et al.  Abnormal behavior recognition for intelligent video surveillance systems: A review , 2018, Expert Syst. Appl..

[18]  Hefeng Wu,et al.  Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Junjie Yan,et al.  Water Filling: Unsupervised People Counting via Vertical Kinect Sensor , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[20]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bingbing Ni,et al.  Crowd Counting via Adversarial Cross-Scale Consistency Pursuit , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Alberto Del Bimbo,et al.  Real-time people counting from depth imagery of crowded environments , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Antoni B. Chan,et al.  Kernel-Based Density Map Generation for Dense Object Counting , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Chalavadi Krishna Mohan,et al.  Human action recognition in RGB-D videos using motion sequence information and deep learning , 2017, Pattern Recognit..

[26]  Yu Qiao,et al.  Depth driven people counting using deep region proposal network , 2017, 2017 IEEE International Conference on Information and Automation (ICIA).

[27]  Yangdong Ye,et al.  DSPNet: Deep scale purifier network for dense crowd counting , 2020, Expert Syst. Appl..

[28]  Yu Wang,et al.  A Deeply-Recursive Convolutional Network For Crowd Counting , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Wen-Chin Chen,et al.  DECCNet: Depth Enhanced Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Hao Chen,et al.  RGBD Salient Object Detection via Disentangled Cross-Modal Fusion , 2020, IEEE Transactions on Image Processing.

[31]  Jing Yang,et al.  Counting Crowds with Perspective Distortion Correction via Adaptive Learning , 2020, Sensors.

[32]  Jiwen Lu,et al.  Multi-modal uniform deep learning for RGB-D person re-identification , 2017, Pattern Recognit..

[33]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[34]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[35]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Vishal M. Patel,et al.  A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation , 2017, Pattern Recognit. Lett..

[37]  Jianfei Cai,et al.  Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sergio A. Velastin,et al.  Crowd analysis: a survey , 2008, Machine Vision and Applications.

[40]  Lu Zhang,et al.  Crowd Counting via Scale-Adaptive Convolutional Neural Network , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Deyu Meng,et al.  DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Huadong Ma,et al.  Real-time accurate crowd counting based on RGB-D information , 2012, 2012 19th IEEE International Conference on Image Processing.

[43]  Weihang Kong,et al.  An object counting network based on hierarchical context and feature fusion , 2019, J. Vis. Commun. Image Represent..

[44]  Jiandong Tian,et al.  RGBD Salient Object Detection via Deep Fusion , 2016, IEEE Transactions on Image Processing.

[45]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).