Understanding manipulation in video

Manipulations are a significant subset of human gestures that are distinguished by the fact that their logic and meaning are particularly clear, being heavily constrained by physical causality. We present techniques and causal semantics for interpreting video of manipulation tasks such as disassembly. Psychologically-based causal constraints are used to detect meaningful changes in the integrity and motions of foreground segmented blobs; a small causal model of manipulation is used to disambiguate and parse these into a coherent account of video's action. The causal constraints are drawn from studies of infant perceptual development; as with infants, they precede and may possibly even bootstrap the ability to reliably segment still objects. Our implementation produces a script of the causal evolution of the scene-output that supports cartoon summary, automated editing, and higher-level reasoning.

[1]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[2]  Jeffrey Mark Siskind,et al.  A Maximum-Likelihood Approach to Visual Event Classification , 1996, ECCV.

[3]  Yasuo Kuniyoshi,et al.  Qualitative Recognition of Ongoing Human Action Sequences , 1993, IJCAI.

[4]  E. Spelke,et al.  Perceiving and reasoning about objects: Insights from infants , 1993 .

[5]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[6]  Allan D. Jepson,et al.  The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..