Analyzing Structured Scenarios by Tracking People and Their Limbs

Title of dissertation: ANALYZING STRUCTURED SCENARIOS BY TRACKING PEOPLE AND THEIR LIMBS Vlad I. Morariu, Doctor of Philosophy, 2010 Dissertation directed by: Professor Larry S. Davis Department of Computer Science The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen’s Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment. ANALYZING STRUCTURED SCENARIOS BY TRACKING PEOPLE AND THEIR LIMBS

[1]  W.K.G. Seah,et al.  Visibility-based exploration in unknown environment containing structured obstacles , 2005, ICAR '05. Proceedings., 12th International Conference on Advanced Robotics, 2005..

[2]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[3]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[4]  Larry S. Davis,et al.  Human Activity Understanding using Visibility Context , 2007 .

[5]  Nicholas R. Howe Evaluating Lookup-Based Monocular Human Pose Tracking on the HumanEva Test Data , 2006, NIPS 2006.

[6]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9]  John W. Fisher,et al.  Tractable Bayesian Inference of Time-Series Dependence Structure , 2009, AISTATS.

[10]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[12]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994, J. Log. Comput..

[16]  Ian D. Reid,et al.  A general method for human activity recognition in video , 2006, Comput. Vis. Image Underst..

[17]  Matej Kristan,et al.  A trajectory-based analysis of coordinated team activity in a basketball game , 2009, Comput. Vis. Image Underst..

[18]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[19]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[20]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[21]  Malik Ghallab,et al.  On Chronicles: Representation, On-line Recognition and Learning , 1996, KR.

[22]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Marcelo H. Ang,et al.  A greedy strategy for tracking a locally predictable target among obstacles , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[24]  Larry S. Davis,et al.  Real-time foreground-background segmentation using codebook model , 2005, Real Time Imaging.

[25]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Luc De Raedt,et al.  Probabilistic Inductive Logic Programming , 2004, Probabilistic Inductive Logic Programming.

[27]  Irfan A. Essa,et al.  Learning Temporal Sequence Model from Partially Labeled Data , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Larry S. Davis,et al.  VidMAP: video monitoring of activity with Prolog , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[29]  Ying Wu,et al.  Decentralized multiple target tracking using netted collaborative autonomous trackers , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[31]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[32]  Luc Van Gool,et al.  Articulated Multi-body Tracking under Egomotion , 2008, ECCV.

[33]  Andrew Blake,et al.  Image Segmentation by Branch-and-Mincut , 2008, ECCV.

[34]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[35]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Michael J. Black,et al.  Learning image statistics for Bayesian tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  Larry S. Davis,et al.  Constraint Integration for Efficient Multiview Pose Estimation with Self-Occlusions , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[39]  Hao Jiang,et al.  Global pose estimation using non-tree models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[42]  Rina Dechter,et al.  AND/OR Branch-and-Bound search for combinatorial optimization in graphical models , 2009, Artif. Intell..

[43]  Michael I. Mandel,et al.  Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation , 2004, NIPS.

[44]  Rina Dechter,et al.  Mixed deterministic and probabilistic networks , 2008, Annals of Mathematics and Artificial Intelligence.

[45]  Pedro Ribeiro,et al.  Human Activity Recognition from Video: modeling, feature selection and classification architecture , 2005 .

[46]  Anthony G. Cohn,et al.  Unsupervised Learning of Event Classes from Video , 2010, AAAI.

[47]  Jan M. Wiener,et al.  Exploring isovist-based correlates of spatial behavior and experience , 2005 .

[48]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[49]  A. Elgammal,et al.  Body Pose Tracking From Uncalibrated Camera Using Supervised Manifold Learning , 2006 .

[50]  Larry S. Davis,et al.  Automatic online tuning for fast Gaussian summation , 2008, NIPS.

[51]  Joshua B. Tenenbaum,et al.  Bayesian models of human action understanding , 2005, NIPS.

[52]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[53]  Ipek Kaynar,et al.  Visibility, movement paths and preferences in open plan museums: An observational and descriptive study of the Ann Arbor Hands-on Museum , 2005 .

[54]  Y. Weiss,et al.  Finding the M Most Probable Configurations using Loopy Belief Propagation , 2003, NIPS 2003.

[55]  Gang Hua,et al.  Learning to estimate human pose with data driven belief propagation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[56]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Jake K. Aggarwal,et al.  Simultaneous tracking of multiple body parts of interacting persons , 2006, Comput. Vis. Image Underst..

[58]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Wen Gao,et al.  Trajectory based event tactics analysis in broadcast sports video , 2007, ACM Multimedia.

[60]  Jean-Christophe Nebel,et al.  Tracking Human Body Parts Using Particle Filters Constrained by Human Biomechanics , 2008, BMVC.

[61]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Larry S. Davis,et al.  Computational Models of Space: Isovists and Isovist Fields , 1979 .

[63]  KimKyungnam,et al.  Real-time foreground-background segmentation using codebook model , 2005 .

[64]  Jana Kosecka,et al.  From sensors to human spatial concepts , 2007, Robotics Auton. Syst..

[65]  Yiannis Aloimonos,et al.  A Roadmap to the Integration of Early Visual Modules , 2007, International Journal of Computer Vision.

[66]  Henry A. Kautz,et al.  Recognizing Multi-Agent Activities from GPS Data , 2010, AAAI.

[67]  Jan Malte Wiener,et al.  Isovists as a Means to Predict Spatial Experience and Behavior , 2004, Spatial Cognition.

[68]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Rina Dechter,et al.  Mini-Bucket Heuristics for Improved Search , 1999, UAI.

[70]  François Brémond,et al.  Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition , 2003, IJCAI.

[71]  Rina Dechter,et al.  AND/OR search spaces for graphical models , 2007, Artif. Intell..

[72]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[73]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .