Person count localization in videos from noisy foreground and detections

This paper formulates and presents a solution to a new problem called person count localization. Given a video of a crowded scene, our goal is to output for each frame a set of: 1) Detections optimally covering both isolated individuals and cluttered groups of people; and 2) Counts of people inside these detections. This problem is a middle-ground between frame-level person counting, which does not localize counts, and person detection aimed at perfectly localizing people with count-one detections. Our problem formulation is important for a wide range of domains, where people appear frequently under severe occlusion within a crowd. As these crowds are often visually distinct from the rest of the scene, they can be viewed as “visual phrases” whose spatially tight localization and count assignment could facilitate higher-level video understanding. For count localization, we specify a novel framework of iterative error-driven revisions of a flow graph derived from noisy input of people detections and foreground segmentation. Each iteration creates and solves an integer program for count localization based on iterative revisions of the flow graph. The graph revisions are based on detected violations of basic integrity constraints. They in turn trigger learned modifications to the graph aimed at reducing noise in input features. For evaluation, we introduce a new metric that measures both count precision and localization of our approach on American football and pedestrian videos.

[1]  Ramakant Nevatia,et al.  Online Learned Discriminative Part-Based Appearance Models for Multi-human Tracking , 2012, ECCV.

[2]  Greg Mori,et al.  Social roles in hierarchical models for human activity recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yandong Tang,et al.  Flow mosaicking: Real-time pedestrian counting without scene-specific learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Antoni B. Chan,et al.  Crossing the Line: Crowd Counting by Integer Programming with Local Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yandong Tang,et al.  Flow mosaicking: Real-time pedestrian counting without scene-specific learning , 2009, CVPR.

[6]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Rui Caseiro,et al.  Globally optimal solution to multi-object tracking with merged measurements , 2011, 2011 International Conference on Computer Vision.

[8]  Björn Ommer,et al.  Learning Latent Constituents for Recognition of Group Activities in Video , 2014, ECCV.

[9]  Shihong Lao,et al.  Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[11]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Silvio Savarese,et al.  Understanding Collective Activitiesof People from Videos , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Afshin Dehghan,et al.  Part-based multiple-person tracking with partial occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[16]  Fred A. Hamprecht,et al.  Conservation Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Yan Huang,et al.  Tracking multiple objects through occlusions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Stefan Carlsson,et al.  Multi-Target Tracking - Linking Identities using Bayesian Network Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Hai Tao,et al.  A Viewpoint Invariant Approach for Crowd Counting , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[20]  Ramakant Nevatia,et al.  Multi-target tracking by on-line learned discriminative appearance models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Stefania Bandini,et al.  Detecting Dominant Motion Flows and People Counting in High Density Crowds , 2014, J. WSCG.

[22]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[23]  Mohamed R. Amer,et al.  Sum-product networks for modeling activities with stochastic structure , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[25]  Sridha Sridharan,et al.  Crowd Counting Using Multiple Local Features , 2009, 2009 Digital Image Computing: Techniques and Applications.

[26]  Lei Sun,et al.  Activity Group Localization by Modeling the Relations among Participants , 2014, ECCV.

[27]  Robert T. Collins,et al.  Vision-Based Analysis of Small Groups in Pedestrian Crowds , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.