Robust representation and recognition of actions in video

Recognizing actions from video and other sensory data is important for a number of applications such as surveillance and human-computer interaction. While the potential applications are compelling and has inspired extensive research on this topic, there are several difficult challenges. These challenges can be broadly classified into four key problems - (1) Action Representation; (2) Feature Extraction; (3) Learning; (4) Inference. There are a range of possible approaches for each of these problems, and the choice depends on the application domain: whether it involves a single person or multiple actors, is the camera static or moving and is the background static or moving. In our work, we focus on recognizing single person actions under a range of background and imaging conditions. We have worked on each of the four key problems in action recognition in this domain, and have made novel contributions. These include the use of hierarchical graphical models for high-level action representation as well as efficient low-level features that are robust to background clutter, background motion and camera motion. We will describe the techniques developed during our research, and present results in a range of challenging indoor and outdoor video sequences. This work can have several potential applications including human-computer interaction, intelligent rooms, and monitoring lightly crowded areas in offices and grocery stores.