Towards Unsupervised Crowd Counting via Regression-Detection Bi-knowledge Transfer

Unsupervised crowd counting is a challenging yet not largely explored task. In this paper, we explore it in a transfer learning setting where we learn to detect and count persons in an unlabeled target set by transferring bi-knowledge learnt from regression- and detection-based models in a labeled source set. The dual source knowledge of the two models is heterogeneous and complementary as they capture different modalities of the crowd distribution. We formulate the mutual transformations between the outputs of regression- and detection-based models as two scene-agnostic transformers which enable knowledge distillation between the two models. Given the regression- and detection-based models and their mutual transformers learnt in the source, we introduce an iterative self-supervised learning scheme with regression-detection bi-knowledge transfer in the target. Extensive experiments on standard crowd counting benchmarks, ShanghaiTech, UCF_CC_50, and UCF_QNRF demonstrate a substantial improvement of our method over other state-of-the-arts in the transfer learning setting.

[1]  Ling Shao,et al.  Relational Attention Network for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Xiang Bai,et al.  Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lu Zhang,et al.  Crowd Counting via Scale-Adaptive Convolutional Neural Network , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Yang Wang,et al.  One-Shot Scene-Specific Crowd Counting , 2019, BMVC.

[9]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Antoni B. Chan,et al.  Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Wei Hu,et al.  DoT-GNN: Domain-Transferred Graph Neural Network for Group Re-identification , 2019, ACM Multimedia.

[12]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Vishal M. Patel,et al.  Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Alexander Hauptmann,et al.  Learning Spatial Awareness to Improve Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Guoyan Zheng,et al.  Crowd Counting with Deep Negative Correlation Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Joost van de Weijer,et al.  Leveraging Unlabeled Data for Crowd Counting by Learning to Rank , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Yuan Yuan,et al.  Focus on Semantic Consistency for Cross-Domain Crowd Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[25]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Deyu Meng,et al.  DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Alexander Hauptmann,et al.  Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting , 2019, ACM Multimedia.

[31]  Jinhui Tang,et al.  Crowd Counting via Multi-layer Regression , 2019, ACM Multimedia.

[32]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[33]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[35]  Qijun Zhao,et al.  Point in, Box Out: Beyond Counting Persons in Crowds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[37]  Shenghua Gao,et al.  Density Map Regression Guided Detection Network for RGB-D Crowd Counting and Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Shin'ichi Satoh,et al.  Incremental Deep Hidden Attribute Learning , 2018, ACM Multimedia.

[40]  Nima Tajbakhsh,et al.  UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[41]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Li Li,et al.  Active Crowd Counting with Limited Supervision , 2020, ECCV.

[44]  Meng Wang,et al.  DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting , 2019, ACM Multimedia.

[45]  Wei Liu,et al.  High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).