HT-GSOM: Dynamic Self-organizing Map with Transience for Human Activity Recognition

Recognition of complex human activities is a prominent area of research in intelligent video surveillance. The current state-of-the-art techniques are largely based on supervised deep learning algorithms. The inability to learn from unlabeled video streams is a key shortcoming in supervised techniques in most current applications where large volumes of unlabeled video data are utilized. Furthermore, the dominant focus on persistence in traditional machine learning algorithms has induced two limitations; the influence of outdated information in memory- guided decision making, and overfitting of acquired knowledge on specific past events, weakening the plasticity of the learning system. To address the above requirements, we propose a new adaptation of the Growing Self Organizing Map (GSOM), formed in a hierarchical two-stream learning pipeline to accommodate unlabeled video data for human activity recognition, which facilitates plasticity by implementing a transience property, without losing the stability of the learning system. The proposed model is evaluated using two benchmark video datasets, confirming its validity and usability for human activity recognition.

[1]  Xuelong Li,et al.  Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Stefan Wermter,et al.  Self-organizing neural integration of pose-motion features for human action recognition , 2015, Front. Neurorobot..

[3]  Xinghuo Yu,et al.  Incremental knowledge acquisition and self-learning for autonomous video surveillance , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[4]  Mubarak Shah,et al.  Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Blake A. Richards,et al.  The Persistence and Transience of Memory , 2017, Neuron.

[6]  Plamen Angelov,et al.  A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition , 2017 .

[7]  Pierre Kornprobst,et al.  Bio-inspired computer vision: Towards a synergistic approach of artificial and biological vision , 2016, Comput. Vis. Image Underst..

[8]  Honghai Liu,et al.  Intelligent Video Systems and Analytics: A Survey , 2013, IEEE Transactions on Industrial Informatics.

[9]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[10]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[12]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[17]  Naveen K. Chilamkurti,et al.  Self-evolving intelligent algorithms for facilitating data interoperability in IoT environments , 2018, Future Gener. Comput. Syst..