Complex Behavior Recognition Based on Convolutional Neural Network: A Survey

Behavior recognition is an important research direction in computer vision. The behavior recognition based on convolutional neural network has become a research hotspot in recent years. The methods based on convolutional neural network can extract features directly from video data, reduce the difference of temporal domain and the influence of spatial complexity. At present, the simple behavior recognition based on convolutional neural network has been solved basically. However, the complex behavior recognition based on convolutional neural network still faces many difficulties. In this paper, the issues of spatial dependencies and time dependencies in complex behavior recognition are discussed. Then convolutional neural network applying to complex behavior recognition is analyzed in detail from time, space, and spatio-temporal aspects following research progress. Finally, the future development of complex behavior recognition based on convolutional neural network is indicated.

[1]  Bo Chen,et al.  Deep Learning of Invariant Spatio-Temporal Features from Video , 2010 .

[2]  Florian Baumann Action Recognition with HOG-OF Features , 2013, GCPR.

[3]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Sridha Sridharan,et al.  Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[6]  Seungjin Choi,et al.  Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[10]  Zhi Liu,et al.  3D-based Deep Convolutional Neural Network for action recognition with depth sequences , 2016, Image Vis. Comput..

[11]  Tao Mei,et al.  Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation , 2016, ICMR.

[12]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[14]  Ding Yuan,et al.  Recurrent Temporal Sparse Autoencoder for attention-based action recognition , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[15]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[17]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Guijin Wang,et al.  Accurate and real-time human action recognition based on 3D skeleton , 2013, Other Conferences.

[19]  Weisheng Li,et al.  Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information , 2016, ICIMCS.

[20]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Meng Wang,et al.  3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks , 2014, ACM Multimedia.

[23]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[24]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Héctor Pomares,et al.  mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications , 2014, IWAAL.

[26]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Christian Wolf,et al.  Spatio-Temporal Convolutional Sparse Auto-Encoder for Sequence Classification , 2012, BMVC.

[28]  Qiang Ji,et al.  Learning a discriminative dictionary for facial expression recognition , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[29]  Tiejun Huang,et al.  Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN , 2016, IEEE Transactions on Multimedia.

[30]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[31]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Tatsuya Harada,et al.  Improved Dense Trajectory with Cross Streams , 2016, ACM Multimedia.

[34]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[35]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[36]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[37]  Ruimin Hu,et al.  Action recognition with temporal scale-invariant deep learning framework , 2017, China Communications.

[38]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[39]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[41]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[43]  Cao Yuan-yuan Action Recognition and Activity Understanding:A Review , 2009 .

[44]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[47]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[48]  Min Liu,et al.  An efficient approach of moving objects detection in complex background , 2009, International Symposium on Multispectral Image Processing and Pattern Recognition.

[49]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[50]  Roland Göcke,et al.  On the Effect of Human Body Parts in Large Scale Human Behaviour Recognition , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[51]  Alexandros Iosifidis,et al.  Multi-view Human Action Recognition: A Survey , 2013, 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[52]  Hyo Jong Lee,et al.  Moving Object Detection Based on Background Subtraction , 2016 .

[53]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[55]  ByoungChul Ko,et al.  Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night , 2016 .

[56]  Heng Tao Shen,et al.  Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition , 2017, IEEE Signal Processing Letters.

[57]  Tang Yun A Survey on Head Pose Estimation , 2014 .