Event semantics in two-person interactions

This work presents a method to represent two-person interactions at a semantic level with a natural language description. A human interaction is composed of two single person actions, which in turn are made up of torso and arm/leg motions. We adopt the verb argument structure in linguistics to represent human action in terms of triplets. Various two-person interactions are represented at a detailed level using multiple triplets aligned along a time line according to the spatial/temporal constraints of the interactions. Our method provides a user-friendly natural-language description of various human interactions, and properly describes positive, neutral, and negative interactions occurring between two persons.

[1]  Ramakant Nevatia,et al.  Hierarchical Language-based Representation of Events in Video Streams , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Mubarak Shah,et al.  Monitoring human behavior from video taken in an office environment , 2001, Image Vis. Comput..

[3]  Sangho Park,et al.  Segmentation and tracking of interacting human body parts under occlusion and shadowing , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[4]  Sangho Park,et al.  Recognition of two-person interactions using a hierarchical Bayesian network , 2003, IWVS '03.

[5]  Anoop Sarkar,et al.  Learning Verb Argument Structure from Minimally Annotated Corpora , 2002, COLING.

[6]  Kunio Fukunaga,et al.  Textual description of human activities by tracking head and hand motions , 2002, Object recognition supported by user interaction for service robots.

[7]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[8]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994 .