View-invariant human activity recognition based on shape and motion features

Recognizing human activities from image sequences is an active area of research in computer vision. Most of the previous work on activity recognition focuses on recognition from a single view and ignores the issue of view invariance. In this paper, we present a view invariant human activity recognition approach that uses both motion and shape information for activity representation. For each frame in the video, a 128 dimensional optical flow vector of the region of interest is used to represent the motion of the human body, and a 90 dimensional eigen-shape vector is used to represent the shape. Each activity is represented by a set of hidden Markov models (HMMs), where each model represents the activity from a different viewing direction, to realize view-invariance recognition. Experiments on a database of video clips of different activities show that our method is robust.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[4]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  J. Ohya,et al.  Applications of HMM modeling to recognizing human gestures in image sequences for a man-machine interface , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.

[6]  KwangYun Wohn,et al.  Recognition of space-time hand-gestures using hidden Markov model , 1996, VRST.

[7]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[11]  Alex Pentland,et al.  Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[12]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[13]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[14]  Kazuhiko Takahashi,et al.  Human body postures from trinocular camera images , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[15]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Yong Rui,et al.  Segmenting visual actions based on spatio-temporal motion patterns , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Mubarak Shah,et al.  View-invariance in action recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Jake K. Aggarwal,et al.  Segmentation and recognition of continuous human activity , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[20]  J. Sullivan,et al.  Action Recognition by Shape Matching to Key Frames , 2002 .

[21]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[22]  O. Masoud,et al.  CAMERA SURVEILLANCE OF CROWDED TRAFFIC SCENES , 2002 .

[23]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Isaac Cohen,et al.  Inference of human postures by classification of 3D human body shape , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[25]  Svetha Venkatesh,et al.  Recognition of human activity through hierarchical stochastic learning , 2003, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..

[26]  Osama Masoud,et al.  Recognizing human activities , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[27]  Yan Huang,et al.  ARGMode - Activity Recognition using Graphical Models , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[28]  Gerhard Rigoll,et al.  Action Recognition in Meeting Scenarios using Global Motion Features , 2003 .

[29]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.