View Invariance for Human Action Recognition

This paper presents an approach for viewpoint invariant human action recognition, an area that has received scant attention so far, relative to the overall body of work in human action recognition. It has been established previously that there exist no invariants for 3D to 2D projection. However, there exist a wealth of techniques in 2D invariance that can be used to advantage in 3D to 2D projection. We exploit these techniques and model actions in terms of view-invariant canonical body poses and trajectories in 2D invariance space, leading to a simple and effective way to represent and recognize human actions from a general viewpoint. We first evaluate the approach theoretically and show why a straightforward application of the 2D invariance idea will not work. We describe strategies designed to overcome inherent problems in the straightforward approach and outline the recognition algorithm. We then present results on 2D projections of publicly available human motion capture data as well on manually segmented real image sequences. In addition to robustness to viewpoint change, the approach is robust enough to handle different people, minor variabilities in a given action, and the speed of aciton (and hence, frame-rate) while encoding sufficient distinction among actions.

[1]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[2]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Richard G. Kurial,et al.  Representation and recognition , 1990 .

[4]  O. Faugeras Three-Dimensional Computer Vision , 1993 .

[5]  O. Faugeras Three-dimensional computer vision: a geometric viewpoint , 1993 .

[6]  Randal C. Nelson,et al.  Detecting activities , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yee-Hong Yang,et al.  First Sight: A Human Body Outline Labeling System , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[9]  K. Åström,et al.  Random Cross Ratios , 1995 .

[10]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[11]  Charlie Rothwell Object Recognition through Invariant Indexing , 1995 .

[12]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[13]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[14]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[15]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Rama Chellappa,et al.  Performance analysis and learning approaches for vehicle detection and counting in aerial images , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Hironobu Fujiyoshi,et al.  Real-time human motion analysis by image skeletonization , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[19]  Larry S. Davis,et al.  Ghost: a human body part labeling system using silhouettes , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[20]  Hyung Lee-Kwang,et al.  Modeling and recognition of hand gesture using colored Petri nets , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[21]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[22]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[23]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[24]  M. Alex O. Vasilescu,et al.  Recognizing action events from multiple viewpoints , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[25]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[26]  Vladimir M. Zatsiorsky,et al.  Kinetics of Human Motion , 2002 .

[27]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[28]  Rómer E. Rosales-del-Moral Specialized Mappings Architecture with Applications to Vision-based Estimation of Articulated Body Pose , 2002 .

[29]  Rama Chellappa,et al.  Quasi-invariants for human action representation and recognition , 2002, Object recognition supported by user interaction for service robots.

[30]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[31]  Rama Chellappa,et al.  View invariants for human action recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[33]  Steven M. Seitz,et al.  View-Invariant Analysis of Cyclic Motion , 1997, International Journal of Computer Vision.

[34]  Human action-recognition using mutual invariants , 2005, Comput. Vis. Image Underst..

[35]  Stephen J. Maybank,et al.  Probabilistic analysis of the application of the cross ratio to model based vision , 1995, International Journal of Computer Vision.