Non-negative matrix completion for action detection

With the increasing number of videos all over the Internet and the increasing number of cameras looking at people around the world, one of the most interesting applications would be human activity recognition in videos. Many researches have been conducted in the literature for this purpose. But, still recognizing activities in a video with unrestricted conditions is a challenging problem. Moreover, finding the spatio-temporal location of the activity in the video is another issue. In this paper, we present a method based on a non-negative matrix completion framework, that learns to label videos with activity classes, and localizes the activity of interest spatio-temporally throughout the video. This approach has a multi-label weakly supervised setting for activity detection, with a convex optimization procedure. The experimental results show that the proposed approach is competitive with the state-of-the-art methods. Developing a multi-label classification framework with a convex optimization process for activity detection.Histogram correction for activity representation in each class, to localize activities in a weakly supervised setting.Proposing a new formulation for matrix completion to deal with classification/localization in video.Developing an activity recognition system in a totally weakly supervised multi-label setting.Developing a non-negative matrix completion framework based on Alternating Direction Method (ADM).

[1]  Hongdong Li,et al.  Element-Wise Factorization for N-View Projective Reconstruction , 2010, ECCV.

[2]  Mubarak Shah,et al.  Multi-agent event recognition by preservation of spatiotemporal relationships between probabilistic models , 2013, Image Vis. Comput..

[3]  Ying Wu,et al.  Discriminative Video Pattern Search for Efficient Action Detection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[5]  Mubarak Shah,et al.  Spatiotemporal Deformable Part Models for Action Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[8]  Yin Zhang,et al.  An alternating direction algorithm for matrix completion with nonnegative factors , 2011, Frontiers of Mathematics in China.

[9]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[11]  Junfeng Yang,et al.  Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization , 2012, Math. Comput..

[12]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[13]  Cristian Sminchisescu,et al.  Convex Multiple-Instance Learning by Estimating Likelihood Ratio , 2010, NIPS.

[14]  Yasuyuki Matsushita,et al.  Camera calibration with lens distortion from low-rank textures , 2011, CVPR 2011.

[15]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[18]  Shuicheng Yan,et al.  Multi-task low-rank affinity pursuit for image segmentation , 2011, 2011 International Conference on Computer Vision.

[19]  Liang-Tien Chia,et al.  Motion Context: A New Representation for Human Action Recognition , 2008, ECCV.

[20]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[21]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[22]  D. Forsyth,et al.  Video Event Detection: From Subvolume Localization To Spatio-Temporal Path Search. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[23]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[24]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[25]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[26]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[27]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  David Elliott,et al.  In the Wild , 2010 .

[30]  Nazli Ikizler-Cinbis,et al.  Action Recognition and Localization by Hierarchical Space-Time Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[32]  Alexandre Bernardino,et al.  Matrix Completion for Multi-label Image Classification , 2011, NIPS.

[33]  R. Andreani,et al.  On the Relation between Constant Positive Linear Dependence Condition and Quasinormality Constraint Qualification , 2005 .

[34]  Kristin P. Bennett,et al.  Fast Bundle Algorithm for Multiple-Instance Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[36]  Alexandre Bernardino,et al.  Matrix Completion for Weakly-Supervised Multi-Label Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[38]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[39]  Zicheng Liu,et al.  Hierarchical Filtered Motion for Action Recognition in Crowded Videos , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  David A. Forsyth,et al.  Video Event Detection: From Subvolume Localization to Spatiotemporal Path Search , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[44]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[45]  Junsong Yuan,et al.  Max-Margin Structured Output Regression for Spatio-Temporal Action Localization , 2012, NIPS.

[46]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[47]  Kristen Grauman,et al.  Efficient activity detection with max-subgraph search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[49]  Yang Wang,et al.  Discriminative figure-centric models for joint action localization and recognition , 2011, 2011 International Conference on Computer Vision.

[50]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[52]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[54]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Ioannis Patras,et al.  Supervised dictionary learning for action localization , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[56]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[57]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Riccardo Cicilloni La Torre C , 2015 .

[59]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[60]  Carsten Rother,et al.  Learning discriminative localization from weakly labeled data , 2014, Pattern Recognit..

[61]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[63]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[64]  Tao Xiang,et al.  Weakly Supervised Action Detection , 2011, BMVC.

[65]  Robert D. Nowak,et al.  Transduction with Matrix Completion: Three Birds with One Stone , 2010, NIPS.

[66]  Ioannis Patras,et al.  Learning codebook weights for action detection , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[67]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[68]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[69]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Junsong Yuan,et al.  Optimal spatio-temporal path discovery for video event detection , 2011, CVPR 2011.