AdaCrowd: Unlabeled Scene Adaptation for Crowd Counting

We address the problem of image-based crowd counting. In particular, we propose a new problem called unlabeled scene adaptive crowd counting. Given a new target scene, we would like to have a crowd counting model specifically adapted to this particular scene based on the target data that capture some information about the new scene. In this paper, we propose to use one or more unlabeled images from the target scene to perform the adaptation. In comparison with the existing problem setups (e.g. fully supervised), our proposed problem setup is closer to the real-world applications of crowd counting systems. We introduce a novel AdaCrowd framework to solve this problem. Our framework consists of a crowd counting network and a guiding network. The guiding network predicts some parameters in the crowd counting network based on the unlabeled images from a particular scene. This allows our model to adapt to different target scenes. The experimental results on several challenging benchmark datasets demonstrate the effectiveness of our proposed approach compared with other alternative methods.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Antoni B. Chan,et al.  Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[3]  Joshua B. Tenenbaum,et al.  One-Shot Learning with a Hierarchical Nonparametric Bayesian Model , 2011, ICML Unsupervised and Transfer Learning.

[4]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[5]  Xiangjian He,et al.  DENet: A Universal Network for Counting Crowd With Varying Densities and Scales , 2019, IEEE Transactions on Multimedia.

[6]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  R. Venkatesh Babu,et al.  Almost Unsupervised Learning for Dense Crowd Counting , 2019, AAAI.

[9]  Antoni B. Chan,et al.  Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[11]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[12]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[13]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[14]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[16]  Yang Wang,et al.  One-Shot Scene-Specific Crowd Counting , 2019, BMVC.

[17]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[18]  Lior Wolf,et al.  Learning to Count with CNN Boosting , 2016, ECCV.

[19]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Ramprasaath R. Selvaraju,et al.  Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[22]  Pei Lv,et al.  Density-Aware Multi-Task Learning for Crowd Counting , 2020, IEEE Transactions on Multimedia.

[23]  Alexander Hauptmann,et al.  Learning Spatial Awareness to Improve Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Bo Hu,et al.  Locality-Constrained Spatial Transformer Network for Video Crowd Counting , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[26]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[28]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[29]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[30]  Xiyang Liu,et al.  Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting , 2020, ECCV.

[31]  Antoni B. Chan,et al.  Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Nikos Paragios,et al.  A MRF-based approach for real-time subway monitoring , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[33]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  A. Marana,et al.  On the efficacy of texture analysis for crowd monitoring , 1998, Proceedings SIBGRAPI'98. International Symposium on Computer Graphics, Image Processing, and Vision (Cat. No.98EX237).

[36]  Ryuzo Okada,et al.  COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Antoni B. Chan,et al.  Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid , 2018, BMVC.

[38]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[39]  Shaogang Gong,et al.  From Semi-supervised to Transfer Counting of Crowds , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Yang Wang,et al.  Few-Shot Scene Adaptive Crowd Counting Using Meta-Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Pascal Fua,et al.  Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[43]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[45]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Haizhou Ai,et al.  End-to-end crowd counting via joint learning local and global count , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[47]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[50]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[51]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.