DSPM: Dynamic Structure Preserving Map for action recognition

In this paper, a Dynamic Structure Preserving Map (DSPM) is proposed to effectively recognize human actions in video sequences. Inspired by the latest feature learning methods, we modified and improved the adaptive learning procedure in self-organizing map (SOM) to capture dynamics of best matching neurons through Markov random walk. The DSPM can learn implicit spatial-temporal correlations from sequential action feature sets and preserve the intrinsic topologies characterized by different human motions. A further advantage of DSPM is its ability to learn low-level features in challenging video data. The projection from high dimensional action features to low dimensional latent neural distribution significantly reduces the computational cost and data redundancy in the recognition process. The effectiveness and robustness of the proposed method is verified through extensive experiments on several benchmark datasets.

[1]  David J. Fleet,et al.  Optical Flow Estimation , 2006, Handbook of Mathematical Models in Computer Vision.

[2]  Martial Hebert,et al.  Spatio-temporal Shape and Flow Correlation for Action Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[4]  Yun Fu,et al.  Sparse Coding on Local Spatial-Temporal Volumes for Human Action Recognition , 2010, ACCV.

[5]  Liang-Tien Chia,et al.  Motion Context: A New Representation for Human Action Recognition , 2008, ECCV.

[6]  Daniela Tuninetti,et al.  Multiple description coding over multiple correlated erasure channels , 2012, Trans. Emerg. Telecommun. Technol..

[7]  Haibo He,et al.  SOMSO: A self-organizing map approach for spatial outlier detection with multiple attributes , 2009, 2009 International Joint Conference on Neural Networks.

[8]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[11]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[12]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[14]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15]  Nikos Paragios,et al.  Handbook of Mathematical Models in Computer Vision , 2005 .

[16]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[17]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Guang Yang,et al.  Small group human activity recognition , 2012, 2012 19th IEEE International Conference on Image Processing.

[20]  Jouko Lampinen,et al.  Temporal Kohonen Map and the Recurrent Self-Organizing Map: Analytical and Experimental Comparison , 2004, Neural Processing Letters.

[21]  Haibo He,et al.  Spatial outlier detection based on iterative self-organizing learning model , 2013, Neurocomputing.

[22]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[23]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[24]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[25]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[26]  David Windridge,et al.  An evaluation of bags-of-words and spatio-temporal shapes for action recognition , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[27]  Svetha Venkatesh,et al.  Activity recognition and abnormality detection with the switching hidden semi-Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[31]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[34]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .