View Invariant Activity Recognition with Manifold Learning

Activity recognition in complex scenes can be very challenging because human actions are unconstrained and may be observed from multiple views. While progress has been made in recognizing activities from fixed views, more research is needed in developing view invariant recognition methods. Furthermore, the recognition and classification of activities involves processing data in the space and time domains, which involves large amounts of data and can be computationally expensive to process. To accommodate for view invariance and high dimensional data we propose the use of Manifold Learning using Locality Preserving Projections (LPP). We develop an efficient set of features based on radial distance and present a Manifold Learning framework for learning low dimensional representations of action primitives that can be used to recognize activities at multiple views. Using our approach we present high recognition rates on the Inria IXMAS dataset.

[1]  Yo-Sung Ho,et al.  Modified Discrete Radon Transforms and Their Application to Rotation-Invariant Image Analysis , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[2]  Ye Mei,et al.  Robust affine invariant shape image retrieval using the ICA Zernike Moment Shape Descriptor , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[3]  Robert Pless,et al.  A Survey of Manifold Learning for Images , 2009, IPSJ Trans. Comput. Vis. Appl..

[4]  Lian Cai,et al.  Rotation, scale and translation invariant image watermarking using Radon transform and Fourier transform , 2004, Proceedings of the IEEE 6th Circuits and Systems Symposium on Emerging Technologies: Frontiers of Mobile and Wireless Communication (IEEE Cat. No.04EX710).

[5]  V. Ramasubramanian,et al.  Towards fast, view-invariant human action recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Maja Pantic,et al.  An implicit spatiotemporal shape model for human activity localization and recognition , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Richard Souvenir,et al.  Viewpoint Manifolds for Action Recognition , 2009, EURASIP J. Image Video Process..

[8]  Yun Tang,et al.  A study of using locality preserving projections for feature extraction in speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[10]  Yiannis Aloimonos,et al.  View-Invariant Modeling and Recognition of Human Actions Using Grammars , 2006, WDV.

[11]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[12]  Osama Masoud,et al.  View-independent human motion classification using image-based reconstruction , 2009, Image Vis. Comput..

[13]  Mannes Poel,et al.  Comparison of silhouette shape descriptors for example-based human pose recovery , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).