Indoor Crowd Counting by Mixture of Gaussians Label Distribution Learning

In this paper, we tackle the problem of crowd counting in indoor videos, where people often stay almost static for a long time. The label distribution, which covers a certain number of crowd counting labels, representing the degree to which each label describes the video frame, is previously adopted to model the label ambiguity of the crowd number. However, since the label ambiguity is significantly affected by the crowd number of the scene, we initialize the label distribution of each frame by the discretized Gaussian distribution with adaptive variance instead of the original single static Gaussian distribution. Moreover, considering the gradual change of crowd numbers in the adjacent frames, a mixture of Gaussian models is proposed to generate the final label distribution representation for each frame. The weights of the Gaussian models rely on the frame and feature distances between the current frame and the adjacent frames. The mixed $\ell _{2,1}$ -norm is adopted to restrict the weights of predicting the adjacent crowd numbers to be locally correlated. We collect three new indoor video datasets with frame number annotation for further research. The proposed approach achieves state-of-the-art performance on seven challenging indoor videos and cross-scene experiments.

[1]  Sudeep Sarkar,et al.  People Counter: Counting of Mostly Static People in Indoor Conditions , 2012, Video Analytics for Business Intelligence.

[2]  Zhi-Hua Zhou,et al.  Facial Age Estimation by Learning from Label Distributions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[4]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[5]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Zhi-Hua Zhou,et al.  Automatic Age Estimation Based on Facial Aging Patterns , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yueting Zhuang,et al.  Data-Dependent Label Distribution Learning for Age Estimation , 2017, IEEE Transactions on Image Processing.

[9]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[11]  Shaogang Gong,et al.  Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jinqiao Wang,et al.  Robust Crowd Segmentation and Counting in Indoor Scenes , 2016, MMM.

[15]  Adrien Descamps,et al.  Counting People in the Crowd Using a Generic Head Detector , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[16]  Hong-Yuan Mark Liao,et al.  Cross-Camera Knowledge Transfer for Multiview People Counting , 2015, IEEE Transactions on Image Processing.

[17]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[18]  Haizhou Ai,et al.  End-to-end crowd counting via joint learning local and global count , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[19]  Hanqing Lu,et al.  Real-time people counting for indoor scenes , 2016, Signal Process..

[20]  Antoni B. Chan,et al.  Generalized Gaussian process models , 2011, CVPR 2011.

[21]  Xin Geng,et al.  Crowd counting in public video surveillance by label distribution learning , 2015, Neurocomputing.

[22]  Jorge Nocedal,et al.  An interior algorithm for nonlinear optimization that combines line search and trust region steps , 2006, Math. Program..

[23]  Rongrong Ji,et al.  Body Structure Aware Deep Crowd Counting , 2018, IEEE Transactions on Image Processing.

[24]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[25]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[27]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yi Ren,et al.  Sense Beauty by Label Distribution Learning , 2017, IJCAI.

[29]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[30]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[32]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Xiaoyan Wang,et al.  Three-Frame Difference Algorithm Research Based on Mathematical Morphology , 2012 .

[34]  Xin Geng,et al.  Pre-release Prediction of Crowd Opinion on Movies by Label Distribution Learning , 2015, IJCAI.

[35]  Xin Geng,et al.  Soft video parsing by label distribution learning , 2018, Frontiers of Computer Science.