A Novel Double-Layer Framework for Joint Segmentation and Recognition of Multiple Actions

This paper aims to address the problem of joint segmentation and recognition of multiple actions in a long-term video. Since features obtained from a single frame cannot describe human motion in a period, some literatures initially divide a long-term video into many video clips with fixed length and represent a long-term video as a sequence of video clips. However, a fixed-length video clip may contain frames from two adjacent actions, which would significantly affect the performance of action segmentation and recognition. In this paper, we develop a double-layer framework for segmenting and recognizing multiple actions in a long-term video. In the first layer, a novel unsupervised method based on the directions of velocity is proposed to initially divide an input video into a series of clips with unfixed length. The second layer takes a sequence of video clips as input, and employs a joint segmentation and recognition method to group video clips into several segments while simultaneously labeling the action category for each segment. Experiments conducted on the IXMAS action dataset verify the effectiveness of the proposed approach.

[1]  Guillermo Cámara Chávez,et al.  A New Method for Static Video Summarization Using Local Descriptors and Video Temporal Segmentation , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.

[2]  Larry S. Davis,et al.  Action recognition using ballistic dynamics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[5]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[7]  D. Marr,et al.  Representation and recognition of the movements of shapes , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[8]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[9]  Yun Fu,et al.  Temporal Subspace Clustering for Human Motion Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[12]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[14]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[15]  Y. Aloimonos,et al.  View invariant identification of pose sequences for action recognition , 2004 .

[16]  Sharath Pankanti,et al.  Spatio-temporal fisher vector coding for surveillance event detection , 2013, ACM Multimedia.

[17]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[18]  Tanveer F. Syeda-Mahmood Segmenting actions in velocity curve space , 2002, Object recognition supported by user interaction for service robots.

[19]  R. Nelson,et al.  Low level recognition of human motion (or how to get your man without finding his body parts) , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[20]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[21]  Reinhard Klein,et al.  Efficient unsupervised temporal segmentation of human motion , 2014, SCA '14.

[22]  Sharath Pankanti,et al.  Temporal Sequence Modeling for Video Event Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yiannis Kompatsiaris,et al.  Human Motion Analysis via Statistical Motion Processing and Sequential Change Detection , 2009, EURASIP J. Image Video Process..

[24]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[26]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  W. Richards,et al.  Boundaries of Visual Motion , 1985 .

[28]  Jun Zhang,et al.  Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model , 2016, IET Comput. Vis..

[29]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.