Counting in Dense Crowds using Deep Features

The goal of this paper is to introduce a new approach for counting humans in images of dense crowds. We will be working with the UCF Crowd Counting Dataset which includes fifty crowd images with head counts ranging from 96 to 4633. Our first method involves extracting features from a pre-trained convolutional neural network (CNN) and training a support vector machine (SVM) that generates counts for each image. The second method involves extracting Dense SIFT features from our dataset and using them to encode Fisher Vectors in order to train another SVM. An extension of both methods involves fusing SIFT features with either the CNN features or Fisher Vectors prior to training. Lastly, we will quantify the counting performance by looking at absolute and normalized absolute differences between ground truths and estimated head counts.

[1]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  PETS2010: Dataset and Challenge , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[5]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.