Real-time human action recognition using stacked sparse autoencoders

Objectives: In this paper, an automated real-time human and human-action detection system is developed using Histogram of Oriented Gradients (HOG) and Stacked Sparse Auto-encoders respectively. Methods: For human detection, a feature descriptor is trained using SVM classifier and then is used for identification of humans in the frames. Stacked Sparse autoencoders are a category of deep neural networks, and in the proposed work is used for the feature extraction of human actions from the human action video dataset. The extracted features represent a dictionary which is used to map the input and produce a linear combination, following that soft-max classification is applied to train the model. To reduce the computational complexity, input frames has been changed into binary temporal difference images and fed to the neural network. Analysis: The proposed model matched the other state of the art models applied for human-action recognition classification problems. Applications: The study reveals that using multiple layers can improve the classification performance: 75% with two-layers and 83% with three-layers model.