Inferring "Dark Matter" and "Dark Energy" from Videos

This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they "attract" people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or "repel" people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as "dark matter", emanating "dark energy" that affects people's trajectories in the video. To detect "dark matter" and infer their "dark energy" field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach "dark matter" and thus satisfy their needs, where their motions are subject to a composite "dark energy" field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended "dark matter" while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people's trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people's trajectories in unobserved parts of the video footage.

[1]  Jean-Claude Latombe,et al.  Numerical potential field techniques for robot path planning , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[2]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[4]  Dimitris N. Metaxas,et al.  Eurographics/ Acm Siggraph Symposium on Computer Animation (2007) Group Behavior from Video: a Data-driven Approach to Crowd Simulation , 2022 .

[5]  Demetri Terzopoulos,et al.  Autonomous pedestrians , 2007, Graph. Model..

[6]  Mubarak Shah,et al.  A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Mubarak Shah,et al.  Floor Fields for Tracking in High Density Crowd Scenes , 2008, ECCV.

[8]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[10]  Anthony Hoogs,et al.  Unsupervised Learning of Functional Categories in Video Scenes , 2010, ECCV.

[11]  Irfan A. Essa,et al.  Motion fields to predict play evolution in dynamic sport scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yunde Jia,et al.  Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[13]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[14]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[15]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[16]  Jianbo Shi,et al.  Multi-hypothesis motion planning for visual object tracking , 2011, 2011 International Conference on Computer Vision.

[17]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[18]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[19]  Song-Chun Zhu,et al.  Image Parsing with Stochastic Scene Grammar , 2011, NIPS.

[20]  Xiaogang Wang,et al.  Random field topic model for semantic region analysis in crowded scenes from tracklets , 2011, CVPR 2011.

[21]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[22]  Mubarak Shah,et al.  Identifying Behaviors in Crowd Scenes Using Stability Analysis for Dynamical Systems , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xiaogang Wang,et al.  Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[25]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Luc Van Gool,et al.  Destination Flow for Crowd Simulation , 2012, ECCV Workshops.

[27]  Mohamed R. Amer,et al.  Cost-Sensitive Top-Down/Bottom-Up Inference for Multiscale Activity Recognition , 2012, ECCV.

[28]  Junseok Kwon,et al.  Wang-Landau Monte Carlo-Based Tracking Methods for Abrupt Motions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.