Movement, activity and action: the role of knowledge in the perception of motion.

This paper presents several approaches to the machine perception of motion and discusses the role and levels of knowledge in each. In particular, different techniques of motion understanding as focusing on one of movement, activity or action are described. Movements are the most atomic primitives, requiring no contextual or sequence knowledge to be recognized; movement is often addressed using either view-invariant or view-specific geometric techniques. Activity refers to sequences of movements or states, where the only real knowledge required is the statistics of the sequence; much of the recent work in gesture understanding falls within this category of motion perception. Finally, actions are larger-scale events, which typically include interaction with the environment and causal relationships; action understanding straddles the grey division between perception and cognition, computer vision and artificial intelligence. These levels are illustrated with examples drawn mostly from the group's work in understanding motion in video imagery. It is argued that the utility of such a division is that it makes explicit the representational competencies and manipulations necessary for perception.

[1]  Roger C. Schank,et al.  CONCEPTUAL DEPENDENCY THEORY , 1975 .

[2]  Hans-Hellmut Nagel,et al.  From image sequences towards conceptual descriptions , 1988, Image Vis. Comput..

[3]  Thomas O. Binford,et al.  Ignorance, myopia, and naiveté in computer vision systems , 1991, CVGIP Image Underst..

[4]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  R. Nelson,et al.  Low level recognition of human motion (or how to get your man without finding his body parts) , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[7]  E. Adelson,et al.  Analyzing gait with spatiotemporal surfaces , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[8]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[9]  Aaron F. Bobick,et al.  Learning visual behavior for gesture analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[10]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[11]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[12]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[14]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[15]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[16]  Aaron F. Bobick,et al.  Computers Seeing Action , 1996, BMVC.

[17]  Justine Cassell,et al.  Recovering the temporal structure of natural gesture , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[18]  Claudio S. Pinhanez,et al.  Approximate World Models: Incorporating Qualitative and Linguistic Information into Vision Systems , 1996, AAAI/IAAI, Vol. 2.

[19]  James W. Davis,et al.  An appearance-based representation of action , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[20]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Larry S. Davis,et al.  Tracking of humans in action: a 3-D model-based approach , 1996 .

[22]  James W. Davis,et al.  Real-time recognition of activity using temporal templates , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[23]  Claudio S. Pinhanez,et al.  Controlling view-based algorithms using approximate world models and action information , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[25]  James W. Davis,et al.  Action Recognition Using Temporal Templates , 1997 .

[26]  Allan D. Jepson,et al.  The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..

[27]  Matthew Brand,et al.  Physics-Based Visual Understanding , 1997, Comput. Vis. Image Underst..

[28]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.