Event recognition based-on social roles in continuous video

In this paper, we present a new method for video event recognition based on social roles of agents, which are inferred from their daily activities in continuous video. This is motivated from the observation that people have their social roles, and the information of social roles in certain scene provides useful cues for recognizing video events. First, events are represented by an And-Or Graph (AOG), which can represent both the hierarchical decompositions from events, sub-events and atomic actions and the contexts for temporal relations. Then, a model of social roles is proposed to infer the roles of the agents in continuous video. Finally, an improved event parsing algorithm based on social roles context is adopted to recognize events. Experimental results show that our method is effective in performing inference tasks of social roles and can improve performance of event recognition.

[1]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[3]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[5]  Hiroshi Murase,et al.  Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[6]  Yunde Jia,et al.  Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[7]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Yongtian Wang,et al.  Inferring social roles in long timespan video sequence , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[9]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hilary Buxton,et al.  Comparison of Feedforward (TDRBF) and Generative (TDRGBN) Network for Gesture Based Control , 2001, Gesture Workshop.

[11]  Stefan Carlsson,et al.  Multi-Target Tracking - Linking Identities using Bayesian Network Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[13]  Kaiqi Huang,et al.  An Extended Grammar System for Learning and Recognizing Complex Visual Events , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Greg Mori,et al.  Social roles in hierarchical models for human activity recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Alper Yilmaz,et al.  Learning Relations among Movie Characters: A Social Network Perspective , 2010, ECCV.