Learning spatio-temporal dependencies for action recognition

In this paper, we propose a spatio-temporal dependencies learning (STDL) method for action recognition. Inspired by self-organizing map, our method can learn implicit spatial-temporal dependencies from sequential action feature sets while preserving the intrinsic topologies characterized in human actions. A further advantage is its ability to project higher dimensional action feature to lower dimensional latent neural distribution, which significantly reduces the computational cost and data redundancy in the learning and recognition process. An ensemble learning strategy using expectation-maximization is adopted to estimate the latent parameters of STDL model. The effectiveness and robustness of the proposed model is verified through extensive experiments on several benchmark datasets.

[1]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Daniela Tuninetti,et al.  Multiple description coding over multiple correlated erasure channels , 2012, Trans. Emerg. Telecommun. Technol..

[5]  Haibo He,et al.  SOMSO: A self-organizing map approach for spatial outlier detection with multiple attributes , 2009, 2009 International Joint Conference on Neural Networks.

[6]  Haibo He,et al.  Spatial outlier detection based on iterative self-organizing learning model , 2013, Neurocomputing.

[7]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[8]  Hong Man,et al.  DSPM: Dynamic Structure Preserving Map for action recognition , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Nikos Paragios,et al.  Handbook of Mathematical Models in Computer Vision , 2005 .

[12]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[13]  David Windridge,et al.  An evaluation of bags-of-words and spatio-temporal shapes for action recognition , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[14]  Jouko Lampinen,et al.  Temporal Kohonen Map and the Recurrent Self-Organizing Map: Analytical and Experimental Comparison , 2004, Neural Processing Letters.

[15]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  David J. Fleet,et al.  Optical Flow Estimation , 2006, Handbook of Mathematical Models in Computer Vision.

[19]  Yun Fu,et al.  Sparse Coding on Local Spatial-Temporal Volumes for Human Action Recognition , 2010, ACCV.

[20]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[21]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.