Objects in Action: An Approach for Combining Action Understanding and Object Perception

Analysis of videos of human-object interactions involves understanding human movements, locating and recognizing objects and observing the effects of human movements on those objects. While each of these can be conducted independently, recognition improves when interactions between these elements are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which unifies the inference processes involved in object classification and localization, action understanding and perception of object reaction. Traditional approaches for object classification and action understanding have relied on shape features and movement analysis respectively. By placing object classification and localization in a video interpretation framework, we can localize and classify objects which are either hard to localize due to clutter or hard to recognize due to lack of discriminative features. Similarly, by applying context on human movements from the objects on which these movements impinge and the effects of these movements, we can segment and recognize actions which are either too subtle to perceive or too hard to recognize using motion features alone.

[1]  A. Wing,et al.  The Psychology of human movement , 1984 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  P. Kroonenberg,et al.  Individual differences and segment interactions in throwing , 1991 .

[4]  Kevin W. Bowyer,et al.  Generic recognition through qualitative reasoning about 3-D shape and object function , 1991, CVPR.

[5]  Marie-Christine Jaulent,et al.  Object structure and action requirements: A compatibility model for functional recognition , 1991, Int. J. Intell. Syst..

[6]  Hiroshi Murase,et al.  Learning Object Models from Appearance , 1993, AAAI.

[7]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[8]  Azriel Rosenfeld,et al.  Recognition by Functional Parts , 1995, Comput. Vis. Image Underst..

[9]  M. Goodale,et al.  The visual brain in action , 1995 .

[10]  G. Rizzolatti,et al.  Premotor cortex and the recognition of motor actions. , 1996, Brain research. Cognitive brain research.

[11]  G. Rizzolatti,et al.  Action recognition in the premotor cortex. , 1996, Brain : a journal of neurology.

[12]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[13]  Ehud Rivlin,et al.  Function From Motion , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Alex Martin,et al.  Representation of Manipulable Man-Made Objects in the Dorsal Stream , 2000, NeuroImage.

[18]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[19]  Hui Gao,et al.  A three-mode expressive feature model of action effort , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[20]  Yasuo Kuniyoshi,et al.  A self-organizing neural model for context-based action recognition , 2003, First International IEEE EMBS Conference on Neural Engineering, 2003. Conference Proceedings..

[21]  Antonio Torralba,et al.  Graphical Model For Recognizing Scenes and Objects. , 2003, NIPS 2003.

[22]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[23]  W. Prinz,et al.  Action comprehension: deriving spatial and functional relations. , 2005, Journal of experimental psychology. Human perception and performance.

[24]  Harold Goodglass,et al.  Inference of object use from pantomimed actions by aphasics and patients with right hemisphere lesions , 1995, Synthese.

[25]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  G. Orban,et al.  Observing Others: Multiple Action Representation in the Frontal Lobe , 2005, Science.

[27]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Y. Aloimonos,et al.  Discovering a Language for Human Activity 1 , 2005 .

[29]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  D. Bub,et al.  Gestural knowledge evoked by objects as part of conceptual representations , 2006 .

[31]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Larry S. Davis,et al.  Ballistic Hand Movements , 2006, AMDO.

[33]  Markus Graf,et al.  The role of action representations in visual object recognition , 2006, Experimental Brain Research.