Modeling Noisy Annotations for Point-Wise Supervision

Point-wise supervision is widely adopted in computer vision tasks such as crowd counting and human pose estimation. In practice, the noise in point annotations may affect the performance and robustness of algorithm significantly. In this paper, we investigate the effect of annotation noise in point-wise supervision and propose a series of robust loss functions for different tasks. In particular, the point annotation noise includes spatial-shift noise, missing-point noise, and duplicate-point noise. The spatial-shift noise is the most common one, and exists in crowd counting, pose estimation, visual tracking, etc, while the missing-point and duplicate-point noises usually appear in dense annotations, such as crowd counting. In this paper, we first consider the shift noise by modeling the real locations as random variables and the annotated points as noisy observations. The probability density function of the intermediate representation (a smooth heat map generated from dot annotations) is derived and the negative log likelihood is used as the loss function to naturally model the shift uncertainty in the intermediate representation. The missing and duplicate noise are further modeled by an empirical way with the assumption that the noise appears at high density region with a high probability. We apply the method to crowd counting, human pose estimation and visual tracking, propose robust loss functions for those tasks, and achieve superior performance and robustness on widely used datasets.

[1]  Shaopeng Yang,et al.  CrowdFormer: An Overlap Patching Vision Transformer for Top-Down Crowd Counting , 2022, IJCAI.

[2]  Antoni B. Chan,et al.  Crowd Counting in the Frequency Domain , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kevin J Liang,et al.  Few-shot Learning with Noisy Labels , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Fang Wen,et al.  Large-Scale Pre-training for Person Re-identification with Noisy Labels , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yanwei Fu,et al.  Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yaowei Wang,et al.  Boosting Crowd Counting via Multifaceted Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dingkang Liang,et al.  An End-to-End Transformer Model for Crowd Localization , 2022, ECCV.

[8]  Antoni B. Chan,et al.  Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting , 2021, ACM Multimedia.

[9]  Yiqiu Shen,et al.  Adaptive Early-Learning Correction for Segmentation from Noisy Annotations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shenghua Gao,et al.  Crowd Counting With Partial Annotations in an Image , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Yalin Zheng,et al.  Spatial Uncertainty-Aware Semi-Supervised Crowd Counting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Ying Tai,et al.  Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Erkun Yang,et al.  Understanding and Improving Early Stopping for Learning with Noisy Labels , 2021, NeurIPS.

[14]  Antoni B. Chan,et al.  Progressive Unsupervised Learning for Visual Object Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bin Xiao,et al.  Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pascal Fua,et al.  Leveraging Self-Supervision for Cross-Domain Crowd Counting , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Se-Young Yun,et al.  FINE Samples for Learning with Noisy Labels , 2021, NeurIPS.

[18]  D. Samaras,et al.  Distribution Matching for Crowd Counting , 2020, NeurIPS.

[19]  Antoni B. Chan,et al.  Kernel-Based Density Map Generation for Dense Object Counting , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wei Wu,et al.  Adaptive Dilated Network With Self-Correction Supervision for Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Vishal M. Patel,et al.  JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Qi Wang,et al.  NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[25]  Antoni B. Chan,et al.  Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Vishal M. Patel,et al.  Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Thomas S. Huang,et al.  HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Fahad Shahbaz Khan,et al.  Learning the Model Update for Siamese Trackers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  R. Venkatesh Babu,et al.  Locate, Size, and Count: Accurately Resolving People in Dense Crowds via Detection , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Baoyuan Wu,et al.  Residual Regression With Semantic Prior for Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Thomas Brox,et al.  Robust Learning Under Label Noise With Iterative Noise-Filtering , 2019, ArXiv.

[35]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Hao Zhu,et al.  CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yusuke Uchida,et al.  Improving Multi-Person Pose Estimation using Label Correction , 2018, ArXiv.

[44]  Yan Yan,et al.  DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking , 2018, ACCV.

[45]  Kaiqi Huang,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yi Liu,et al.  Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals , 2018, ACM Multimedia.

[47]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Fei Su,et al.  Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[49]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[50]  Bingbing Ni,et al.  Crowd Counting via Adversarial Cross-Scale Consistency Pursuit , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Antoni B. Chan,et al.  Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid , 2018, BMVC.

[53]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[54]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[55]  Antoni B. Chan,et al.  Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[56]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[60]  Dit-Yan Yeung,et al.  Spatiotemporal Modeling for Crowd Counting in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[64]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[65]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[66]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[68]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[69]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  A. Smeulders,et al.  Siamese Instance Search for Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[72]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[73]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[78]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[80]  Shaogang Gong,et al.  From Semi-supervised to Transfer Counting of Crowds , 2013, 2013 IEEE International Conference on Computer Vision.

[81]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[83]  R. Collins,et al.  Marked point processes for crowd counting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[85]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[87]  Antoni B. Chan,et al.  Calibration-Free Multi-view Crowd Counting , 2022, ECCV.

[88]  Antoni B. Chan,et al.  Supplemental for A Generalized Loss Function for Crowd Counting and Localization , 2021 .

[89]  Antoni B. Chan,et al.  Modeling Noisy Annotations for Crowd Counting , 2020, NeurIPS.

[90]  Fanman Meng,et al.  Learning with Noisy Class Labels for Instance Segmentation , 2020, ECCV.

[91]  Ralph B. D'Agostino,et al.  Tests for Departure from Normality , 1973 .