Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM

Nowadays digital surveillance systems are universally installed for continuously collecting enormous amounts of data, thereby requiring human monitoring for the identification of different activities and events. Smarter surveillance is the need of this era through which normal and abnormal activities can be automatically identified using artificial intelligence and computer vision technology. In this paper, we propose a framework for activity recognition in surveillance videos captured over industrial systems. The continuous surveillance video stream is first divided into important shots, where shots are selected using the proposed convolutional neural network (CNN) based human saliency features. Next, temporal features of an activity in the sequence of frames are extracted by utilizing the convolutional layers of a FlowNet2 CNN model. Finally, a multilayer long short-term memory is presented for learning long-term sequences in the temporal optical flow features for activity recognition. Experiments11https://github.com/Aminullah6264/Activity_Rec_ML-LSTM. are conducted using different benchmark action and activity recognition datasets, and the results reveal the effectiveness of the proposed method for activity recognition in industrial settings compared with state-of-the-art methods.

[1]  Amit K. Roy-Chowdhury,et al.  Context-Aware Activity Modeling Using Hierarchical Conditional Random Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Guangjie Han,et al.  Concept drift detection for data stream learning based on angle optimized global embedding and principal component analysis in sensor networks , 2017, Comput. Electr. Eng..

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Sung Wook Baik,et al.  Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[5]  Xi Wang,et al.  Fast Summarization of User-Generated Videos: Exploiting Semantic, Emotional, and Quality Clues , 2016, IEEE MultiMedia.

[6]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ding Yuan,et al.  ARCH: Adaptive recurrent-convolutional hybrid networks for long-term action recognition , 2016, Neurocomputing.

[8]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[9]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[10]  Lin Sun,et al.  Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  David A. Clausi,et al.  Soccer Video Structure Analysis by Parallel Feature Fusion Network and Hidden-to-Observable Transferring Markov Model , 2017, IEEE Access.

[12]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Sung Wook Baik,et al.  Divide-and-conquer based summarization framework for extracting affective video content , 2016, Neurocomputing.

[14]  Christian Wolf,et al.  Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks , 2010, ICANN.

[15]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Dong Xu,et al.  Action Recognition Using Multilevel Features and Latent Structural SVM , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yiannis Andreopoulos,et al.  Voronoi-Based Compact Image Descriptors: Efficient Region-of-Interest Retrieval With VLAD and Deep-Learning-Based Descriptors , 2016, IEEE Transactions on Multimedia.

[19]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[20]  Javier Del Ser,et al.  A Deep Learning Approach to Device-Free People Counting from WiFi Signals , 2018, IDC.

[21]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[22]  Marcus Hutter,et al.  Discriminative Hierarchical Rank Pooling for Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ramadhan J. Mstafa,et al.  A New Video Steganography Algorithm Based on the Multiple Object Tracking and Hamming Codes , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[24]  Ranga Rodrigo,et al.  Action recognition by single stream convolutional neural networks: An approach using combined motion and static information , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[25]  Javier Del Ser,et al.  Evolving Spiking Neural Networks for online learning over drifting data streams , 2018, Neural Networks.

[26]  Zheng Yan,et al.  A survey on game theoretical methods in Human-Machine Networks , 2017, Future Gener. Comput. Syst..

[27]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Sidarta Ribeiro,et al.  Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations , 2016, PloS one.

[29]  Sung Wook Baik,et al.  Early fire detection using convolutional neural networks during surveillance for effective disaster management , 2017, Neurocomputing.

[30]  Sung Wook Baik,et al.  Object-oriented convolutional features for fine-grained image retrieval in large surveillance datasets , 2018, Future Gener. Comput. Syst..

[31]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[34]  Dewen Hu,et al.  Learning Effective Event Models to Recognize a Large Number of Human Actions , 2014, IEEE Transactions on Multimedia.

[35]  Laurence T. Yang,et al.  An Efficient Deep Learning Model to Predict Cloud Workload for Industry Informatics , 2018, IEEE Transactions on Industrial Informatics.

[36]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Bhiksha Raj,et al.  Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sung Wook Baik,et al.  Integrating salient colors with rotational invariant texture features for image representation in retrieval systems , 2017, Multimedia Tools and Applications.

[40]  Juergen Gall,et al.  Structural Recurrent Neural Network (SRNN) for Group Activity Analysis , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[42]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[44]  Yunde Jia,et al.  Content-Attention Representation by Factorized Action-Scene Network for Action Recognition , 2018, IEEE Transactions on Multimedia.

[45]  Georgios Meditskos,et al.  Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Sung Wook Baik,et al.  Efficient Conversion of Deep Features to Compact Binary Codes Using Fourier Decomposition for Multimedia Big Data , 2018, IEEE Transactions on Industrial Informatics.

[47]  Sung Wook Baik,et al.  Efficient CNN based summarization of surveillance videos for resource-constrained devices , 2020, Pattern Recognit. Lett..

[48]  Behrooz Mahasseni,et al.  Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Chokri Ben Amar,et al.  Human action recognition based on multi-layer Fisher vector encoding method , 2015, Pattern Recognit. Lett..

[51]  K. R. Ramakrishnan,et al.  A Cause and Effect Analysis of Motion Trajectories for Modeling Actions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.