论文信息 - Learning to Recognize Unsuccessful Activities Using a Two-Layer Latent Structural Model

Learning to Recognize Unsuccessful Activities Using a Two-Layer Latent Structural Model

In this paper, we propose to recognize unsuccessful activities (e.g., one tries to dress himself but fails), which have much more complex temporal structures, as we don't know when the activity performer fails (which is called the point of failure in this paper). We develop a two-layer latent structural SVM model to tackle this problem: the first layer specifies the point of failure, and the second layer specifies the temporal positions of a number of key stages accordingly. The stages before the point of failure are successful stages, while the stages after the point of failure are background stages. Given weakly labeled training data, our training algorithm alternates between inferring the two-layer latent structure and updating the structural SVM parameters. In recognition, our method can not only recognize unsuccessful activities, but also infer the latent structure. We demonstrate the effectiveness of our proposed method on several newly collected datasets.

Qiang Zhou | Gang Wang | G. Wang | Qiang-feng Zhou

[1] Bart Selman,et al. Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[2] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Bingbing Ni,et al. Recognizing human group activities with localized causalities , 2009, CVPR 2009.

[4] Yang Wang,et al. Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[6] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[7] Michael S. Ryoo,et al. Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[8] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9] Greg Mori,et al. Max-margin hidden conditional random fields for human action recognition , 2009, CVPR.

[10] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[11] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[13] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[14] David Elliott,et al. In the Wild , 2010 .

[15] J.K. Aggarwal,et al. Human activity analysis , 2011, ACM Comput. Surv..

[16] Bingbing Ni,et al. RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[17] William T. Freeman,et al. Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Thomas Deselaers,et al. ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[21] Yi Yang,et al. Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[23] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[24] Daniel P. Huttenlocher,et al. Distance Transforms of Sampled Functions , 2012, Theory Comput..