Estimation of crowd density by clustering motion cues

Understanding crowd behavior using automated video analytics is a relevant research problem in recent times due to complex challenges in monitoring large gatherings. From an automated video surveillance perspective, estimation of crowd density in particular regions of the video scene is an indispensable tool in understanding crowd behavior. Crowd density estimation provides the measure of number of people in a given region at a specified time. While most of the existing computer vision methods use supervised training to arrive at density estimates, we propose an approach to estimate crowd density using motion cues and hierarchical clustering. The proposed method incorporates optical flow for motion estimation, contour analysis for crowd silhouette detection, and clustering to derive the crowd density. The proposed approach has been tested on a dataset collected at the Melbourne Cricket Ground (MCG) and two publicly available crowd datasets—Performance Evaluation of Tracking and Surveillance (PETS) 2009 and University of California, San Diego (UCSD) Pedestrian Traffic Database—with different crowd densities (medium- to high-density crowds) and in varied environmental conditions (in the presence of partial occlusions). We show that the proposed approach results in accurate estimates of crowd density. While the maximum mean error of $$3.62$$3.62 was received for MCG and PETS datasets, it was $$2.66$$2.66 for UCSD dataset. The proposed approach delivered superior performance in $$50~\%$$50% of the cases on PETS $$2009$$2009 dataset when compared with existing methods.

[1]  Joel Z. Leibo,et al.  Why The Brain Separates Face Recognition From Object Recognition , 2011, NIPS.

[2]  Kevin L. Moore,et al.  Rational Radial Distortion Models of Camera Lenses with Analytical Solution for Distortion Correction , 2004, Int. J. Inf. Acquis..

[3]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[4]  Sing Bing Kang,et al.  Parameter-Free Radial Distortion Correction with Center of Distortion Estimation , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ho-Sub Yoon,et al.  A robust human head detection method for human tracking , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Jörg Weule,et al.  Non-Linear Gaussian Filters Performing Edge Preserving Diffusion , 1995, DAGM-Symposium.

[7]  D J Heeger,et al.  Model for the extraction of image flow. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[8]  Dmitry B. Goldgof,et al.  Nonrigid motion analysis , 1994 .

[9]  Peter Allen,et al.  Image-flow computation: An estimation-theoretic framework and a unified perspective , 1992, CVGIP Image Underst..

[10]  Larry S. Davis,et al.  Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[12]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Richard Y. D. Xu,et al.  Multiple curvature based approach to human upper body parts detection with connected ellipse model fine-tuning , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[15]  David J. Fleet,et al.  Computation of component image velocity from local phase information , 1990, International Journal of Computer Vision.

[16]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[17]  P. Anandan,et al.  A computational framework and an algorithm for the measurement of visual motion , 1987, International Journal of Computer Vision.

[18]  Yangsheng Xu,et al.  Crowd Density Estimation Using Texture Analysis and Learning , 2006, 2006 IEEE International Conference on Robotics and Biomimetics.

[19]  Jake K. Aggarwal,et al.  Articulated and elastic non-rigid motion: a review , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[20]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[21]  Adrien Descamps,et al.  Counting People in the Crowd Using a Generic Head Detector , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[22]  Yongjun Ma,et al.  Short term prediction of crowd density using v-SVR , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[23]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[24]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[25]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Nicolas Thome,et al.  Fast People Counting Using Head Detection from Skeleton Graph , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[27]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Other Conferences.

[28]  Lu Wang,et al.  Three-Dimensional Model-Based Human Detection in Crowded Scenes , 2012, IEEE Transactions on Intelligent Transportation Systems.

[29]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[30]  David J. Heeger,et al.  Optical flow using spatiotemporal filters , 2004, International Journal of Computer Vision.

[31]  A. Verri,et al.  A computational approach to motion perception , 1988, Biological Cybernetics.

[32]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[33]  Lei Meng,et al.  A people counting system based on head-shoulder detection and tracking in surveillance video , 2010, 2010 International Conference On Computer Design and Applications.

[34]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[35]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Soraia Raupp Musse,et al.  Crowd Analysis Using Computer Vision Techniques , 2010, IEEE Signal Processing Magazine.

[37]  Yi-Ping Hung,et al.  Multi-class multi-instance boosting for part-based human detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[38]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[39]  Kazuhiko Yamamoto,et al.  Real-time face and head detection using four directional features , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[40]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[41]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Hans-Hellmut Nagel,et al.  On the Estimation of Optical Flow: Relations between Different Approaches and Some New Results , 1987, Artif. Intell..

[43]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  M. Kilger,et al.  A shadow handler in a video-based real-time traffic monitoring system , 1992, [1992] Proceedings IEEE Workshop on Applications of Computer Vision.

[45]  Jianpeng Zhou,et al.  Real Time Robust Human Detection and Tracking System , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[46]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Tieniu Tan,et al.  Rapid and robust human detection and tracking based on omega-shape features , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[48]  W. Eric L. Grimson,et al.  Using adaptive tracking to classify and monitor activities in a site , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[49]  Seth J. Teller,et al.  Particle Video: Long-Range Motion Estimation Using Point Trajectories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Sergio A. Velastin,et al.  Crowd analysis: a survey , 2008, Machine Vision and Applications.

[51]  Mario Vento,et al.  Trainable estimators for indirect people counting: A comparative study , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[52]  Andrea Cavallaro,et al.  Accepted for Publication in Ieee Transactions on Image Processing Adaptive Appearance Modeling for Video Tracking: Survey and Evaluation , 2022 .

[53]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Wanqing Li,et al.  A part-based template matching method for multi-view human detection , 2009, 2009 24th International Conference Image and Vision Computing New Zealand.

[55]  Antonio Albiol,et al.  VIDEO ANALYSIS USING CORNER MOTION STATISTICS , 2009 .

[56]  Harish Bhaskar,et al.  Articulated human body parts detection based on cluster background subtraction and foreground matching , 2013, Neurocomputing.

[57]  Yaobin Mao,et al.  Estimation of crowd density using multi-local features and regression , 2010, 2010 8th World Congress on Intelligent Control and Automation.

[58]  G.K.H. Pang,et al.  Automated people counting at a mass site , 2008, 2008 IEEE International Conference on Automation and Logistics.

[59]  Larry S. Davis,et al.  A fast background scene modeling and maintenance for outdoor surveillance , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[60]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[61]  Hui Cheng,et al.  Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection , 2006, ECCV.

[62]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[63]  Ramesh C. Jain,et al.  On the Analysis of Accumulative Difference Pictures from Image Sequences of Real World Scenes , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[65]  Xuelong Li,et al.  A Review of Active Appearance Models , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[66]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[67]  Dacheng Tao,et al.  Sparse Camera Network for Visual Surveillance -- A Comprehensive Survey , 2013, ArXiv.

[68]  Ramakant Nevatia,et al.  Bayesian human segmentation in crowded situations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[69]  Chuan-Yu Chang,et al.  Practical Homography-based perspective correction method for License Plate Recognition , 2012, 2012 International Conference on Information Security and Intelligent Control.

[70]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[71]  Mario Vento,et al.  A Method for Counting People in Crowded Scenes , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[72]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Vittorio Murino,et al.  Part-based human detection on Riemannian manifolds , 2010, 2010 IEEE International Conference on Image Processing.

[74]  Mao Liu,et al.  Real-Time Crowd Massing Risk Supervision System Based on Massing Crowd Counting in Public Venue , 2009, 2009 International Symposium on Computer Network and Multimedia Technology.

[75]  Jean-Marc Odobez,et al.  Multi-Layer Background Subtraction Based on Color and Texture , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Yee-Hong Yang,et al.  Experimental evaluation of motion constraint equations , 1991, CVGIP Image Underst..

[77]  Edward J. Delp,et al.  Crowd flow estimation using multiple visual features for scenes with changing crowd densities , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[78]  M. Nixon,et al.  On crowd density estimation for surveillance , 2006 .

[79]  Robin Tommy,et al.  An approach for fully automating perspective images based on symmetry and line intersection , 2011, 2011 International Conference on Image Information Processing.

[80]  Anupam Agrawal,et al.  A survey on activity recognition and behavior understanding in video surveillance , 2012, The Visual Computer.

[81]  R.N. Grant,et al.  Perspective correction for improved visual registration using natural features. , 2008, 2008 23rd International Conference Image and Vision Computing New Zealand.

[82]  Chang-Lung Tsai,et al.  Crowd Density Estimation Based on Frequency Analysis , 2011, 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[83]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[84]  Ramakant Nevatia,et al.  Tracking of Multiple, Partially Occluded Humans based on Static Body Part Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[85]  Lei Huang,et al.  Crowd density analysis using co-occurrence texture features , 2010, 5th International Conference on Computer Sciences and Convergence Information Technology.

[86]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[87]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Allen M. Waxman,et al.  Convected activation profiles and the measurement of visual motion , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[89]  N. Krahnstoever,et al.  Multi-camera person tracking in crowded environments , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[90]  Aziz Umit Batur,et al.  Adaptive active appearance models , 2005, IEEE Transactions on Image Processing.

[91]  Kazuhiko Sumi,et al.  Background subtraction based on cooccurrence of image variations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[92]  Daniela Moctezuma,et al.  HoGG: Gabor and HoG-based human detection for surveillance in non-controlled environments , 2013, Neurocomputing.

[93]  Junhee Park,et al.  Lens distortion correction using ideal image coordinates , 2009, IEEE Transactions on Consumer Electronics.

[94]  Ramakant Nevatia,et al.  Segmentation and tracking of multiple humans in complex situations , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[95]  Marimuthu Palaniswami,et al.  Crowd density estimation based on optical flow and hierarchical clustering , 2013, 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI).