Cross-View Cross-Scene Multi-View Crowd Counting

Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera, capturing more people in the scene, and improve counting performance for occluded people or those in low resolution. However, the current multi-view paradigm trains and tests on the same single scene and camera-views, which limits its practical application. In this paper, we propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts. To dynamically handle the challenge of optimal view fusion under scene and camera layout change and non-correspondence noise due to camera calibration errors or erroneous features, we propose a CVCS model that attentively selects and fuses multiple views together using camera layout geometry, and a noise view regularization method to train the model to handle non-correspondence errors. We also generate a large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset. We then test our trained CVCS model on real multi-view counting datasets, by using unsupervised domain transfer. The proposed CVCS model trained on synthetic data outperforms the same model trained only on real data, and achieves promising performance compared to fully supervised methods that train and test on the same single scene.

[1]  Xiaogang Wang,et al.  LCrowdV: Generating Labeled Videos for Simulation-Based Crowd Behavior Learning , 2016, ECCV Workshops.

[2]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[3]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lars Petersson,et al.  Effective Use of Synthetic Data for Urban Scene Semantic Segmentation , 2018, ECCV.

[5]  Robert T. Collins,et al.  Crowd Detection with a Multiview Sampler , 2010, ECCV.

[6]  Lingqiao Liu,et al.  Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks , 2020, ECCV.

[7]  Joost van de Weijer,et al.  Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  R. Venkatesh Babu,et al.  Almost Unsupervised Learning for Dense Crowd Counting , 2019, AAAI.

[9]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[10]  Wei Wu,et al.  Adaptive Dilated Network With Self-Correction Supervision for Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hong-Yuan Mark Liao,et al.  Cross-Camera Knowledge Transfer for Multiview People Counting , 2015, IEEE Transactions on Image Processing.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Nicu Sebe,et al.  Reverse Perspective Network for Perspective-Aware Object Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Li Li,et al.  Active Crowd Counting with Limited Supervision , 2020, ECCV.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Vishal M. Patel,et al.  Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Ling Shao,et al.  Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[20]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Jiahua Zhang,et al.  Batch modeling of 3D city based on ESRI cityengine , 2013 .

[22]  Melih Kandemir,et al.  Gaussian Process Density Counting from Weak Supervision , 2016, ECCV.

[23]  Nicolas Courty,et al.  Using the Agoraset dataset: Assessing for the quality of crowd video analysis methods , 2014, Pattern Recognit. Lett..

[24]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[25]  Baochang Zhang,et al.  NAS-Count: Counting-by-Density with Neural Architecture Search , 2020, ECCV.

[26]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[27]  Antoni B. Chan,et al.  Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sergey I. Nikolenko Synthetic Data for Deep Learning , 2019, ArXiv.

[29]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[30]  D. Manocha,et al.  AADS: Augmented autonomous driving simulation using data-driven algorithms , 2019, Science Robotics.

[31]  Qi Zhang,et al.  3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels , 2020, AAAI.

[32]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[33]  Antoni B. Chan,et al.  Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Nicu Sebe,et al.  Weakly-Supervised Crowd Counting Learns from Sorting Rather Than Locations , 2020, ECCV.

[37]  Yi Yang,et al.  Camera Style Adaptation for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  R. Venkatesh Babu,et al.  Learning to Count in the Crowd from Limited Labeled Data , 2020, ECCV.

[39]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[40]  Pascal Fua,et al.  Deep Occlusion Reasoning for Multi-camera Multi-target Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Sridha Sridharan,et al.  Scene invariant multi camera crowd counting , 2014, Pattern Recognit. Lett..

[42]  Lucia Maddalena,et al.  People counting by learning their appearance in a multi-view camera environment , 2014, Pattern Recognit. Lett..

[43]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[45]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[47]  Pei Lv,et al.  Attention Scaling for Crowd Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Antoni B. Chan,et al.  Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[49]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[50]  Luc Van Gool,et al.  WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Antoni B. Chan,et al.  Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid , 2018, BMVC.

[53]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Lei Huang,et al.  People Counting across Multiple Cameras for Intelligent Video Surveillance , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[55]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[57]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[58]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).