Human action recognition based on scene semantics

Like outdoors, indoor security is also a critical problem and human action recognition in indoor area is still a hot topic. Most studies on human action recognition ignored the semantic information of a scene, whereas indoors contains varieties of semantics. Meanwhile, the depth sensor with color and depth data is more suitable for extracting the semantics context in human actions. Hence, this paper proposed an indoor action recognition method using Kinect based on the semantics of a scene. First, we proposed a trajectory clustering algorithm for a three-dimensional (3D) scene by combining the different characteristics of people such as the spatial location, movement direction, and speed. Based on the clustering results and scene context, it concludes a region of interest (ROI) extraction method for indoors, and dynamic time warping (DTW) is used to study the abnormal action sequences. Finally, the color and depth-data-based 3D motion history image (3D–MHI) features and the semantics context of the scene were combined to recognize human action. In the experiment, two datasets were tested and the results demonstrate that our semantics-based method performs better than other methods.

[1]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[4]  Lynne E. Parker,et al.  4-dimensional local spatio-temporal features for human activity recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2013, Journal of Real-Time Image Processing.

[6]  Mohan M. Trivedi,et al.  A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Xinyan Zhu,et al.  KINECT-BASED REAL-TIME RGB-D IMAGE FUSION METHOD , 2012 .

[9]  Sang-Woong Lee,et al.  Real-Time Gesture Recognition Using 3D Motion History Model , 2005, ICIC.

[10]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[12]  Thomas B. Moeslund,et al.  A selective spatio-temporal interest point detector for human action recognition in complex scenes , 2011, 2011 International Conference on Computer Vision.

[13]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Shehzad Khalid,et al.  Motion Trajectory Learning in the DFT-Coefficient Feature Space , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[15]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Lei Han,et al.  Human Interaction Recognition Using Spatio-Temporal Words: Human Interaction Recognition Using Spatio-Temporal Words , 2010 .

[17]  Tian-Tsong Ng,et al.  Multimodal Multipart Learning for Action Recognition in Depth Videos , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  David A. Forsyth,et al.  Learning the Behavior of Users in a Public Space through Video Tracking , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[19]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[20]  Ruiduo Yang,et al.  Coupled grouping and matching for sign and gesture recognition , 2009, Comput. Vis. Image Underst..

[21]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[22]  Rémi Ronfard,et al.  Motion History Volumes for Free Viewpoint Action Recognition , 2005 .

[23]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[24]  Sebastian Nowozin,et al.  Action Points: A Representation for Low-latency Online Human Action Recognition , 2012 .

[25]  Joseph J. LaViola,et al.  Measuring and reducing observational latency when recognizing actions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Osama Masoud,et al.  A method for human action recognition , 2003, Image Vis. Comput..

[27]  William Brendel,et al.  Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[28]  Li Jun Human Interaction Recognition Using Spatio-Temporal Words , 2010 .

[29]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[30]  Osama Masoud,et al.  Learning Traffic Patterns at Intersections by Spectral Clustering of Motion Trajectories , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[32]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[34]  Md. Atiqur Rahman Ahad,et al.  Motion history image: its variants and applications , 2012, Machine Vision and Applications.

[35]  Pau-Choo Chung,et al.  An Interaction-Embedded HMM Framework for Human Behavior Understanding: With Nursing Environments as Examples , 2010, IEEE Transactions on Information Technology in Biomedicine.

[36]  Longbing Cao,et al.  Graph-based coupled behavior analysis: A case study on detecting collaborative manipulations in stock markets , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[37]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[38]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[39]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[40]  Hairong Qi,et al.  Spatio-temporal feature extraction and representation for RGB-D human action recognition , 2014, Pattern Recognit. Lett..

[41]  Dietmar Bauer,et al.  Track-Based Finding of Stopping Pedestrians - A Practical Approach for Analyzing a Public Infrastructure , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[42]  Xiao-Ping Zhang,et al.  Coupled Observation Decomposed Hidden Markov Model for Multiperson Activity Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[44]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[45]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[46]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[47]  Wei Guo,et al.  Efficient Interaction Recognition through Positive Action Representation , 2013 .

[48]  Mubarak Shah,et al.  Multi feature path modeling for video surveillance , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[49]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[50]  Shehzad Khalid,et al.  Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space , 2006, Multimedia Systems.

[51]  Osama Masoud,et al.  Detection of loitering individuals in public transportation areas , 2005, IEEE Transactions on Intelligent Transportation Systems.

[52]  Takio Kurita,et al.  Motion Recognition by Higher Order Local Auto Correlation Features of Motion History Images , 2008, 2008 Bio-inspired, Learning and Intelligent Systems for Security.