Counting moving persons in crowded scenes

The paper presents a method for estimating the number of moving people in a scene for video surveillance applications. The method performance has been characterized on the public database used for the PETS 2009 and 2010 international competitions; the proposed method has been compared, on the same database, with the PETS competitions participants. The system exhibits a high accuracy, and revealed to be so fast that it can be used in real time surveillance applications. The rationale of the method lies on the extraction of suited scale-invariant feature points and the successive selection among them of the moving ones, under the hypothesis that the latter are associated to moving people. The perspective distortions are taken into account by dividing the input frames into smaller horizontal zones, each having (approximately) the same perspective effects. Therefore, the evaluation of the number of people is separately carried out for each zone, and the results are summed up. The most important peculiarity of the proposed method is the availability of a simple training procedure using a brief video sequence that shows a person walking around in the scene; the procedure automatically evaluates all the parameters needed by the system, thus making the method particularly suited for end-user applications.

[1]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[2]  N. S. Love,et al.  An Empirical Study of Block Matching Techniques for the Detection of Moving Objects , 2006 .

[3]  M. Nixon,et al.  On crowd density estimation for surveillance , 2006 .

[4]  Antonio Albiol,et al.  VIDEO ANALYSIS USING CORNER MOTION STATISTICS , 2009 .

[5]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sovira Tan,et al.  Inverse perspective mapping and optic flow: A calibration method and a quantitative analysis , 2006, Image Vis. Comput..

[8]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[9]  Tommy W. S. Chow,et al.  A neural-based crowd estimation by hybrid global learning algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Roberto Cipolla,et al.  Unsupervised Bayesian Detection of Independent Motion in Crowds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Mario Vento,et al.  A Method for Counting People in Crowded Scenes , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[12]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[13]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hai Tao,et al.  A Viewpoint Invariant Approach for Crowd Counting , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[16]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Peter H. Tu,et al.  Simultaneous estimation of segmentation and shape , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Luciano da Fontoura Costa,et al.  Estimating crowd density with Minkowski fractal dimension , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).