Accurate Gigapixel Crowd Counting by Iterative Zooming and Refinement

The increasing prevalence of gigapixel resolutions has presented new challenges for crowd counting. Such resolutions are far beyond the memory and computation limits of current GPUs, and available deep neural network architectures and training procedures are not designed for such massive inputs. Although several methods have been proposed to address these challenges, they are either limited to downsampling the input image to a small size, or borrowing from other gigapixel tasks, which are not tailored for crowd counting. In this paper, we propose a novel method called GigaZoom, which iteratively zooms into the densest areas of the image and refines coarser density maps with finer details. Through experiments, we show that GigaZoom obtains the state-of-the-art for gigapixel crowd counting and improves the accuracy of the next best method by 42%.

[1]  A. Iosifidis,et al.  PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks , 2023, 2023 International Joint Conference on Neural Networks (IJCNN).

[2]  Qi Zhang,et al.  Efficient High-Resolution Deep Learning: A Survey , 2022, ACM Comput. Surv..

[3]  R. G. Krishnan,et al.  Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Guiguang Ding,et al.  Towards real-time object detection in GigaPixel-level video , 2021, Neurocomputing.

[5]  Ying Tai,et al.  To Choose or to Fuse? Scale Selection for Crowd Counting , 2021, AAAI.

[6]  Guangshuai Gao,et al.  CNN-based Density Estimation and Crowd Counting: A Survey , 2020, ArXiv.

[7]  Xiya Zhang,et al.  PANDA: A Gigapixel-Level Human-Centric Video Dataset , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yiyong Huang,et al.  Gigapixel-Level Image Crowd Counting using Csrnet , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[9]  Yongdong Zhang,et al.  Dense Scale Network for Crowd Counting , 2019, ICMR.

[10]  Francesco Ciompi,et al.  Neural Image Compression for Gigapixel Histopathology Image Analysis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[12]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  K. O. Niemann,et al.  Local Maximum Filtering for the Extraction of Tree Locations and Basal Area from High Spatial Resolution Imagery , 2000 .