Domain-adaptive Crowd Counting via Inter-domain Features Segregation and Gaussian-prior Reconstruction

Recently, crowd counting using supervised learning achieves a remarkable improvement. Nevertheless, most counters rely on a large amount of manually labeled data. With the release of synthetic crowd data, a potential alternative is transferring knowledge from them to real data without any manual label. However, there is no method to effectively suppress domain gaps and output elaborate density maps during the transferring. To remedy the above problems, this paper proposed a Domain-Adaptive Crowd Counting (DACC) framework, which consists of Inter-domain Features Segregation (IFS) and Gaussian-prior Reconstruction (GPR). To be specific, IFS translates synthetic data to realistic images, which contains domain-shared features extraction and domain-independent features decoration. Then a coarse counter is trained on translated data and applied to the real world. Moreover, according to the coarse predictions, GPR generates pseudo labels to improve the prediction quality of the real data. Next, we retrain a final counter using these pseudo labels. Adaptation experiments on six real-world datasets demonstrate that the proposed method outperforms the state-of-the-art methods. Furthermore, the code and pre-trained models will be released as soon as possible.

[1]  Hao Tang,et al.  Generalizing semi-supervised generative adversarial networks to regression , 2018, Comput. Vis. Image Underst..

[2]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[3]  R. Venkatesh Babu,et al.  Almost Unsupervised Learning for Dense Crowd Counting , 2019, AAAI.

[4]  Yu-Chiang Frank Wang,et al.  Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ling Shao,et al.  Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  R. Venkatesh Babu,et al.  Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Vishal M. Patel,et al.  Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[11]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[12]  Deyu Meng,et al.  DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Wangmeng Zuo,et al.  Supplementary Materials for Perspective-Guided Convolution Networks for Crowd Counting , 2019 .

[15]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaogang Wang,et al.  Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset , 2016, IEEE Transactions on Multimedia.

[17]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Joost van de Weijer,et al.  Leveraging Unlabeled Data for Crowd Counting by Learning to Rank , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Wei Lin,et al.  C^3 Framework: An Open-source PyTorch Code for Crowd Counting , 2019, ArXiv.

[20]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[21]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[24]  Joost van de Weijer,et al.  Image-to-image translation for cross-domain disentanglement , 2018, NeurIPS.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Wei-Lun Chang,et al.  All About Structure: Adapting Structural Information Across Domains for Boosting Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[28]  Xiaogang Wang,et al.  Fully Convolutional Neural Networks for Crowd Segmentation , 2014, ArXiv.

[29]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[30]  Ming-Hsuan Yang,et al.  CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[32]  Bolei Zhou,et al.  Measuring Crowd Collectiveness , 2013, CVPR.

[33]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[34]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Roberto Cipolla,et al.  Unsupervised Bayesian Detection of Independent Motion in Crowds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Feiping Nie,et al.  Detecting Coherent Groups in Crowd Scenes by Multiview Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[42]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Rama Chellappa,et al.  Unsupervised Domain-Specific Deblurring via Disentangled Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[46]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[47]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Sridha Sridharan,et al.  Crowd Counting Using Multiple Local Features , 2009, 2009 Digital Image Computing: Techniques and Applications.

[50]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.