Graphical Models for Recognizing Human Interactions

We describe a real-time computer vision and machine learning system for modeling and recognizing human actions and interactions. Two different domains are explored: recognition of two-handed motions in the martial art 'Tai Chi', and multiple-person interactions in a visual surveillance task. Our system combines top-down with bottom-up information using a feedback loop, and is formulated with a Bayesian framework. Two different graphical models (HMMs and Coupled HMMs) are used for modeling both individual actions and multiple-agent interactions, and CHMMs are shown to work more efficiently and accurately for a given amount of training. Finally, to overcome the limited amounts of training data, we demonstrate that 'synthetic agents' (Alife-style agents) can be used to develop flexible prior models of the person-to-person interactions.