Human Action Segmentation Based on a Streaming Uniform Entropy Slice Method

Segmentation of human actions is a major research problem in video understanding. A number of existing approaches demonstrate that performing action segmentation before action recognition results in better recognition performance. In this paper, we address the problem of action segmentation in an online manner. We first extend the clustering-based image segmentation approach into a temporal one, where hierarchical supervoxel levels for action segmentation are generated accordingly. We then propose a streaming approach to flatten the hierarchical levels into one based on uniform entropy slice, in order to preserve important information in the video. The flattened level contains the silhouette of a human with the structure of body parts labeled in different labels. We then combine the human structure information and the original video frames to “strengthen” the action in a video, which paves the way for accurate action recognition. The experimental results show that our online approach achieves satisfactory performance regarding action segmentation or recognition on various publicly available data sets, including the DAVIS data set, the UCF Sports data set, and the KTH data set.

[1]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[2]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Anthony Hoogs,et al.  An Efficient Online Hierarchical Supervoxel Segmentation Algorithm for Time-critical Applications , 2014, BMVC.

[5]  Rama Chellappa,et al.  Entropy rate superpixel segmentation , 2011, CVPR 2011.

[6]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[7]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Paria Mehrani,et al.  Superpixels and Supervoxels in an Energy Optimization Framework , 2010, ECCV.

[9]  Yang Wang,et al.  Discriminative figure-centric models for joint action localization and recognition , 2011, 2011 International Conference on Computer Vision.

[10]  Chenliang Xu,et al.  Evaluation of super-voxel methods for early video processing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Hujun Bao,et al.  Spatio-Temporal Video Segmentation of Static Scenes and Its Applications , 2015, IEEE Transactions on Multimedia.

[12]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[13]  Min Chen,et al.  Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments , 2015, Int. J. Multim. Data Eng. Manag..

[14]  Wei Chen,et al.  Actionness Ranking with Lattice Conditional Ordinal Random Fields , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Spatio-temporal Object Detection Proposals , 2014, ECCV.

[18]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[23]  Cheng Peng,et al.  An online hierarchical supervoxel segmentation algorithm based on Uniform Entropy Slice , 2016, 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[24]  G. Virk,et al.  Safety and Ethical Concerns in Mixed Human-Robot Control of Vehicles , 2017 .

[25]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Patrick Bouthemy,et al.  Action Localization with Tubelets from Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[29]  Jingjing Jiang,et al.  Robust Shared-Control for Rear-Wheel Drive Cars , 2017 .

[30]  Cordelia Schmid,et al.  Learning to detect Motion Boundaries , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ran Xu,et al.  Human action segmentation with hierarchical supervoxel consistency , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Shaogang Gong,et al.  Action categorization with modified hidden conditional random field , 2010, Pattern Recognit..

[36]  Truong Q. Nguyen,et al.  Improving streaming video segmentation with early and mid-level visual processing , 2014, IEEE Winter Conference on Applications of Computer Vision.

[37]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Derek Hoiem,et al.  Beyond the Line of Sight: Labeling the Underlying Surfaces , 2012, ECCV.

[39]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[41]  Nazli Ikizler-Cinbis,et al.  Action Recognition and Localization by Hierarchical Space-Time Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[43]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[44]  Chenliang Xu,et al.  Flattening Supervoxel Hierarchies by the Uniform Entropy Slice , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[46]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[48]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[49]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Stefanos Zafeiriou,et al.  Online Kernel Slow Feature Analysis for Temporal Video Segmentation and Tracking , 2015, IEEE Transactions on Image Processing.

[51]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[52]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[53]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Ah Chung Tsoi,et al.  Motion Boundary Trajectory for Human Action Recognition , 2014, ACCV Workshops.