Recognize human activities from multi-part missing videos

Recognizing human activities from multi-part missing videos is a challenge problem. When the multiple missing parts are continuous, the problem is reduced to activity recognition in videos with single part missing at any position which is focused on by many researches. However, in many practical applications, some temporal gaps always appear in captured videos due to random frame loss(e.g. noise interfere). To solve this problem, we propose a novel framework: 1) dividing each video into multiple equal-length segments, where the local spatio-temporal features extracted; 2) concatenating combinatorial sparse activity dictionaries, formed by over-complete dictionary of each segment; 3) computing combinatorial sparse coefficients of each segment, based on activity dictionaries above; 4) formulating probability of each activity to estimate the correct class. Our experiments achieve superior performance not only in videos with single part missing at any position, but also in videos with multiple parts missing.

[1]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Irfan A. Essa,et al.  A novel sequence representation for unsupervised analysis of human activities , 2009, Artif. Intell..

[5]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[8]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[9]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[10]  Benjamin Z. Yao,et al.  Unsupervised learning of event AND-OR grammar and semantics from video , 2011, 2011 International Conference on Computer Vision.

[11]  Luc Van Gool,et al.  Variations of a Hough-Voting Action Recognition System , 2010, ICPR Contests.

[12]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[13]  Tae-Kyun Kim,et al.  Real-time Action Recognition by Spatiotemporal Semantic and Structural Forests , 2010, BMVC.

[14]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[15]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[16]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[17]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[20]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[21]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.