View-invariance in action recognition

Automatically understanding human actions using motion trajectories derived from video sequences is a very challenging problem. Since an action takes place in 3-D, and is projected on 2-D image, depending on the viewpoint of the camera, the projected 2-D trajectory may vary. Therefore, the same action may have very different trajectories, and trajectories of different actions may look the same. This may create a problem in interpretation of trajectories at the higher level. However, if the representation of actions only captures characteristics, which are view-invariant, then the higher level interpretation can proceed without any ambiguity. In most of the current work on action recognition, the issue of view invariance has been ignored. Therefore, proposed methods do not succeed in more general situations. In this paper, we first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. Then this representation is used by our system to learn human actions without any training. The system is able to incrementally learn different actions starting with no model. It can discover instances of the same action performed by different people, and in different viewpoints.

[1]  James W. Davis,et al.  Action Recognition Using Temporal Templates , 1997 .

[2]  Kunio Fukunaga,et al.  Generating natural language description of human behavior from video images , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Hans-Hellmut Nagel,et al.  Algorithmic characterization of vehicle trajectories from image sequences by motion verbs , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  James W. Davis,et al.  Categorical representation and recognition of oscillatory motion patterns , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[6]  Ramesh C. Jain,et al.  Invariant surface characteristics for 3D object recognition in range images , 1985, Comput. Vis. Graph. Image Process..

[7]  Andrew Zisserman,et al.  Geometric invariance in computer vision , 1992 .

[8]  Ramprasad Polana,et al.  Temporal texture and activity recognition , 1994 .

[9]  Jeffrey Mark Siskind,et al.  A Maximum-Likelihood Approach to Visual Event Classification , 1996, ECCV.

[10]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[11]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  John K. Tsotsos,et al.  A framework for visual motion understanding , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  John R. Kender,et al.  Finding skin in color images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.