Sparse representation for recognizing object-to-object actions under occlusions

In this paper, we describe the formatting guidelines for ACM SIG Proceedings. This paper proposes a novel event classification scheme to analyze various interaction actions between persons using sparse representation. The occlusion problem and the high complexity to model complicated interactions are two major challenges in person-to-person action analysis. To address the occlusion problem, the proposed scheme represents an action sample in an over-complete dictionary whose base elements are the training samples themselves. This representation is naturally sparse and makes errors (caused by different environmental changes like lighting or occlusions) sparsely appear in the training library. Because of the sparsity, it is robust to occlusions and lighting changes. In addition, a novel Hamming distance classification (HDC) scheme is proposed to classify action events to detailed types. Because the nature of Hamming code is highly tolerant to noise, the HDC scheme is also robust to occlusions. The high complexity of complicated action modeling can be tackled by adding more examples to the over-complete dictionary. Thus, even though the interaction relations are complicated, the proposed method still works successfully to recognize them and can be easily extended to analyze action events among multiple persons. More importantly, the HDC scheme is very efficient and suitable for real-time applications because no optimization process is involved to calculate the reconstruction error.

[1]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[4]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[5]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[8]  Ying Wang,et al.  Human Activity Recognition Based on R Transform , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[10]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[11]  Roman Filipovych,et al.  Recognizing primitive interactions by exploring actor-object states , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Nipun Kwatra,et al.  A Framework for Activity Recognition and Detection of Unusual Activities , 2004, ICVGIP.

[13]  Cordelia Schmid,et al.  Actom sequence models for efficient action detection , 2011, CVPR 2011.