Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd Counting

Datasets for training crowd counting deep networks are typically heavy-tailed in count distribution and exhibit discontinuities across the count range. As a result, the de facto statistical measures (MSE, MAE) exhibit large variance and tend to be unreliable indicators of performance across the count range. To address these concerns in a holistic manner, we revise processes at various stages of the standard crowd counting pipeline. To enable principled and balanced minibatch sampling, we propose a novel smoothed Bayesian sample stratification approach. We propose a novel cost function which can be readily incorporated into existing crowd counting deep networks to encourage strata-aware optimization. We analyze the performance of representative crowd counting approaches across standard datasets at per strata level and in aggregate. We analyze the performance of crowd counting approaches across standard datasets and demonstrate that our proposed modifications noticeably reduce error standard deviation. Our contributions represent a nuanced, statistically balanced and fine-grained characterization of performance for crowd counting approaches.

[1]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[2]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Xuelong Li,et al.  NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  J. Scargle Studies in astronomical time series analysis. I - Modeling random processes in the time domain , 1981 .

[6]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hao Lu,et al.  From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[9]  J. Chiang,et al.  STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS , 2012, 1207.5578.

[10]  Xiangmin Xu,et al.  Multi-scale convolutional neural networks for crowd counting , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[11]  Pascal Fua,et al.  Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Qi Wang,et al.  SCAR: Spatial-/Channel-wise Attention Regression Networks for Crowd Counting , 2019, Neurocomputing.

[13]  Chao Lu,et al.  Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting , 2019, ArXiv.

[14]  D. Samaras,et al.  Distribution Matching for Crowd Counting , 2020, NeurIPS.