A Multiattribute Sparse Coding Approach for Action Recognition From a Single Unknown Viewpoint

We propose a novel approach for view-independent action recognition using multiattribute sparse representation enforced with group constraints. First, an oversegmentation-based background modeling and foreground detection approach is employed to extract silhouettes from action videos. Then multiple time intervals of motion history image are computed to capture motion and pose information in human activities. To obtain a more accurate and discriminative representation, we propose multiattribute sparse representation for multiview action video classification. Actions with multiple attributes can be represented by individual attribute matrices to describe group property for each action instance. These attribute matrices are incorporated into the formulation of l1-minimization. The sparsity property as well as the group constraints make the basis selection in sparse coding more efficient in terms of accuracy. Especially, our approach is able to operate under the condition of partially labeled attributes in the training data. Finally, we demonstrate the proposed algorithm through experiments on three multiview human action datasets to show the effectiveness and robustness of the proposed method.

[1]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[2]  Ramakant Nevatia,et al.  View and scale invariant action recognition using multiview shape-flow models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alexandros Iosifidis,et al.  Multi-view human action recognition under occlusion based on Fuzzy distances and neural networks , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[4]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[5]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[6]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[7]  Thomas B. Moeslund,et al.  A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points , 2012, IEEE Journal of Selected Topics in Signal Processing.

[8]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[9]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[10]  Alexandros Iosifidis,et al.  Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis , 2013, Signal Process..

[11]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Dengxin Dai,et al.  Three-layer Spatial Sparse Coding for Image Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Ferdinand van der Heijden,et al.  Recursive unsupervised learning of finite mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[15]  Alexandros Iosifidis,et al.  Multi-view human movement recognition based on fuzzy distances and linear discriminant analysis , 2012, Comput. Vis. Image Underst..

[16]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Jingjing Zheng,et al.  Learning View-Invariant Sparse Representations for Cross-View Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Nicolas Martel-Brisson,et al.  Learning and Removing Cast Shadows through a Multidistribution Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Richard Souvenir,et al.  Learning the viewpoint manifold for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rama Chellappa,et al.  Role of shape and kinematics in human movement analysis , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Ioannis Pitas,et al.  The i3DPost Multi-View and 3D Human Action/Interaction Database , 2009, 2009 Conference for Visual Media Production.

[26]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Wei Zhang,et al.  Moving Cast Shadows Detection Using Ratio Edge , 2007, IEEE Transactions on Multimedia.

[28]  Yun Fu,et al.  Sparse Coding on Local Spatial-Temporal Volumes for Human Action Recognition , 2010, ACCV.

[29]  Liang-Tien Chia,et al.  Multi-layer group sparse coding — For concurrent image classification and annotation , 2011, CVPR 2011.

[30]  Alexandros Iosifidis,et al.  View-Invariant Action Recognition Based on Artificial Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[31]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[32]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[33]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[34]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Marko Heikkilä,et al.  A texture-based method for modeling the background and detecting moving objects , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[37]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[42]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[44]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[45]  Gertjan J. Burghouts Soft-Assignment Random-forest with an Application to Discriminative Representation of Human Actions in Videos , 2013, Int. J. Pattern Recognit. Artif. Intell..

[46]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[48]  Alexandros Iosifidis,et al.  Movement recognition exploiting multi-view information , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[49]  Hassan Foroosh,et al.  View-invariant action recognition using fundamental ratios , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Behrooz Mahasseni,et al.  Latent Multitask Learning for View-Invariant Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[51]  Touradj Ebrahimi,et al.  Cast shadow segmentation using invariant color features , 2004, Comput. Vis. Image Underst..

[52]  Dit-Yan Yeung,et al.  Human action recognition using Local Spatio-Temporal Discriminant Embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Yi-Ping Hung,et al.  Efficient hierarchical method for background subtraction , 2007, Pattern Recognit..

[55]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[57]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[58]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[59]  Rama Chellappa,et al.  Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[60]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[61]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[62]  Shang-Hong Lai,et al.  Adaptive Foreground Object Extraction for Real-Time Video Surveillance with Lighting Variations , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[63]  Alexandros Iosifidis,et al.  Minimum Class Variance Extreme Learning Machine for Human Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[64]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[65]  Te-Feng Su,et al.  Multi-attribute sparse representation with group constraints for face recognition under different variations , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[66]  Thomas B. Moeslund,et al.  A selective spatio-temporal interest point detector for human action recognition in complex scenes , 2011, 2011 International Conference on Computer Vision.

[67]  Mohan M. Trivedi,et al.  Human action recognition using multiple views: a comparative perspective on recent developments , 2011, J-HGBU '11.

[68]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[69]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.