Towards fast, view-invariant human action recognition

In this paper, we propose a fast method to recognize human actions which accounts for intra-class variability in the way an action is performed. We propose the use of a low dimensional feature vector which consists of (a) the projections of the width profile of the actor on to an ldquoaction basisrdquo and (b) simple spatio-temporal features. The action basis is built using eigenanalysis of walking sequences of different people. Given the limited amount of training data, Dynamic Time Warping (DTW) is used to perform recognition. We propose the use of the average-template with multiple features, first used in speech recognition, to better capture the intra-class variations for each action. We demonstrate the efficacy of this algorithm using our low dimensional feature to robustly recognize human actions. Furthermore, we show that view-invariant recognition can be performed by using a simple data fusion of two orthogonal views. For the actions that are still confusable, a temporal discriminative weighting scheme is used to distinguish between them. The effectiveness of our method is demonstrated by conducting experiments on the multi-view IXMAS dataset of persons performing various actions.

[1]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Yanxi Liu,et al.  Gait Sequence Analysis Using Frieze Patterns , 2002, ECCV.

[3]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ronen Basri,et al.  Shape representation and classification using the Poisson equation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Rama Chellappa,et al.  View invariants for human action recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[9]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[10]  Rama Chellappa,et al.  From Videos to Verbs : Mining Videos for Events using a Cascade of Dynamical Systems , 2007 .

[11]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[12]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Rama Chellappa,et al.  From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Rama Chellappa,et al.  Gait Analysis for Human Identification , 2003, AVBPA.

[17]  Fritz Class,et al.  A learning procedure for speaker-dependent word recognition systems based on sequential processing of input tokens , 1983, ICASSP.

[18]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.