论文信息 - Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting

Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting

Zero-shot cross-domain crowd counting is a challenging task where a crowd counting model is trained on a source domain (i.e., training dataset) and no additional labeled or unlabeled data is available for fine-tuning the model when testing on an unseen target domain (i.e., a different testing dataset). The generalisation performance of existing crowd counting methods is typically limited due to the large gap between source and target domains. Here, we propose a novel Crowd Counting framework built upon an external Momentum Template, termed C2MoT, which enables the encoding of domain specific information via an external template representation. Specifically, the Momentum Template (MoT) is learned in a momentum updating way during offline training, and then is dynamically updated for each test image in online cross-dataset evaluation. Thanks to the dynamically updated MoT, our C2MoT effectively generates dense target correspondences that explicitly accounts for head regions, and then effectively predicts the density map based on the normalized correspondence map. Experiments on large scale datasets show that our proposed C2MoT achieves leading zero-shot cross-domain crowd counting performance without model fine-tuning, while also outperforming domain adaptation methods that use fine-tuning on target domain data. Moreover, C2MoT also obtains state-of-the-art counting performance on the source domain.

[1] R. Venkatesh Babu,et al. Top-Down Feedback for Crowd Counting Convolutional Neural Network , 2018, AAAI.

[2] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[3] Haroon Idrees,et al. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[4] Yihong Gong,et al. Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5] Antoni B. Chan,et al. Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid , 2018, BMVC.

[6] Shiv Surya,et al. Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Yuan Yuan,et al. Focus on Semantic Consistency for Cross-Domain Crowd Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Liang Lin,et al. Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[9] Zhiguo Cao,et al. Decoupled Two-Stage Crowd Counting and Beyond , 2021, IEEE Transactions on Image Processing.

[10] Antoni B. Chan,et al. A Generalized Loss Function for Crowd Counting and Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Baoyuan Wu,et al. Residual Regression With Semantic Prior for Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Qi Wang,et al. Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[13] Baochang Zhang,et al. NAS-Count: Counting-by-Density with Neural Architecture Search , 2020, ECCV.

[14] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[15] Haroon Idrees,et al. Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Wei Lin,et al. Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Antoni B. Chan,et al. Modeling Noisy Annotations for Crowd Counting , 2020, NeurIPS.

[18] R. Venkatesh Babu,et al. Locate, Size, and Count: Accurately Resolving People in Dense Crowds via Detection , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Antoni B. Chan,et al. Kernel-Based Density Map Generation for Dense Object Counting , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] R. Collins,et al. Marked point processes for crowd counting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Nuno Vasconcelos,et al. Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Nuno Vasconcelos,et al. Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23] Wei Liu,et al. High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Miaojing Shi,et al. Towards Unsupervised Crowd Counting via Regression-Detection Bi-knowledge Transfer , 2020, ACM Multimedia.

[25] D. Samaras,et al. Distribution Matching for Crowd Counting , 2020, NeurIPS.

[26] Guanbin Li,et al. Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Pascal Fua,et al. Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Shenghua Gao,et al. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Wei Lin,et al. C^3 Framework: An Open-source PyTorch Code for Crowd Counting , 2019, ArXiv.

[30] Xiyang Liu,et al. Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting , 2020, ECCV.

[31] Vishal M. Patel,et al. JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Fei Su,et al. Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[33] Yuhong Li,et al. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36] Antoni B. Chan,et al. Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Pei Lv,et al. Attention Scaling for Crowd Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[40] Xiang Bai,et al. Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Hao Lu,et al. Weighing Counts: Sequential Crowd Counting by Reinforcement Learning , 2020, ECCV.

[42] Hieu Le,et al. Iterative Crowd Counting , 2018, ECCV.

[43] Yuan Yuan,et al. Feature-Aware Adaptation and Density Alignment for Crowd Counting in Video Surveillance , 2020, IEEE Transactions on Cybernetics.

[44] Vishal M. Patel,et al. Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Noel E. O'Connor,et al. People, Penguins and Petri Dishes: Adapting Object Counting Models to New Visual Domains and Object Types Without Forgetting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Tieniu Tan,et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.