The POETICON enacted scenario corpus — A tool for human and computational experiments on action understanding

A good data corpus lies at the heart of progress in both perceptual/cognitive science and in computer vision. While there are a few datasets that deal with simple actions, creating a realistic corpus for complex, long action sequences that contains also human-human interactions has so far not been attempted to our knowledge. Here, we introduce such a corpus for (inter)action understanding that contains six everyday scenarios taking place in a kitchen / living-room setting. Each scenario was acted out several times by different pairs of actors and contains simple object interactions as well as spoken dialogue. In addition, each scenario was first recorded with several HD cameras and also with motion-capturing of the actors and several key objects. Having access to the motion capture data allows not only for kinematic analyses, but also allows for the production of realistic animations where all aspects of the scenario can be fully controlled. We also present results from a first series of perceptual experiments that show how humans are able to infer scenario classes, as well as individual actions and objects from computer animations of everyday situations. These results can serve as a benchmark for future computational approaches that begin to take on complex action understanding.

[1]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[2]  Jeffrey M. Zacks,et al.  Segmentation in the perception and memory of events , 2008, Trends in Cognitive Sciences.

[3]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[4]  F. Pollick,et al.  A motion capture library for the study of identity, gender, and emotion perception from biological motion , 2006, Behavior research methods.

[5]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[6]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[7]  Claire L. Roether,et al.  Critical features for the perception of emotion from gait. , 2009, Journal of vision.

[8]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[9]  Yiannis Aloimonos,et al.  The syntax of human actions and interactions , 2012, Journal of Neurolinguistics.

[10]  N. Troje Decomposing biological motion: a framework for analysis and synthesis of human gait patterns. , 2002, Journal of vision.

[11]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[12]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[13]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[14]  R. Malach,et al.  Intersubject Synchronization of Cortical Activity During Natural Vision , 2004, Science.

[15]  W. Dittrich Action Categories and the Perception of Biological Motion , 1993, Perception.

[16]  Antonino Casile,et al.  Critical features for the recognition of biological motion. , 2005, Journal of vision.

[17]  Cristina Becchio,et al.  Inferring intentions from biological motion: A stimulus set of point-light communicative interactions , 2010, Behavior research methods.

[18]  Christian Wallraven,et al.  The POETICON Corpus: Capturing Language Use and Sensorimotor Experience in Everyday Interaction , 2010, LREC.

[19]  Jodie A. Baird,et al.  Infants' On-line Segmentation of Dynamic Human Action , 2007 .