Crowd Counting by Adapting Convolutional Neural Networks with Side Information

Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolutional filter weights adapt to the current scene context via the side information. In particular, we model the filter weights as a low-dimensional manifold, parametrized by the side information, within the high-dimensional space of filter weights. With the help of side information and adaptive weights, the ACNN can disentangle the variations related to the side information, and extract discriminative features related to the current context. Since existing crowd counting datasets do not contain ground-truth side information, we collect a new dataset with the ground-truth camera angle and height as the side information. On experiments in crowd counting, the ACNN improves counting accuracy compared to a plain CNN with a similar number of parameters. We also apply ACNN to image deconvolution to show its potential effectiveness on other computer vision applications.

[1]  Rob Fergus,et al.  Restoring an Image Taken through a Window Covered with Dirt or Rain , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Sridha Sridharan,et al.  Crowd Counting Using Multiple Local Features , 2009, 2009 Digital Image Computing: Techniques and Applications.

[3]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lior Wolf,et al.  A Dynamic Convolutional Layer for short rangeweather prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shaogang Gong,et al.  Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[7]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[8]  Antoni B. Chan,et al.  Small instance detection by integer programming on object density maps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Roberto Cipolla,et al.  Unsupervised Bayesian Detection of Independent Motion in Crowds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[12]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Andrew Zisserman,et al.  Interactive Object Counting , 2014, ECCV.

[16]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[17]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Hai Tao,et al.  A Viewpoint Invariant Approach for Crowd Counting , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[22]  Peter Young,et al.  Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..

[23]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[25]  Antoni B. Chan,et al.  Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[28]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[29]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[32]  Tommy W. S. Chow,et al.  A neural-based crowd estimation by hybrid global learning algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[33]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[34]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Ce Liu,et al.  Deep Convolutional Neural Network for Image Deconvolution , 2014, NIPS.