论文信息 - Human Activity Learning using Object Affordances from RGB-D Videos

Human Activity Learning using Object Affordances from RGB-D Videos

Human activities comprise several sub-activities performed in a sequence and involve interactions with various objects. This makes reasoning about the object affordances a central task for activity recognition. In this work, we consider the problem of jointly labeling the object affordances and human activities from RGB-D videos. We frame the problem as a Markov Random Field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural SVM approach, where labeling over various alternate temporal segmentations are considered as latent variables. We tested our method on a dataset comprising 120 activity videos collected from four subjects, and obtained an end-to-end precision of 81.8% and recall of 80.0% for labeling the activities.

[1] Thorsten Joachims,et al. Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[2] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[4] Bart Selman,et al. Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[5] J. Gibson. The Ecological Approach to Visual Perception , 1979 .

[6] Daniel P. Huttenlocher,et al. Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[7] Siddhartha S. Srinivasa,et al. MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8] Zhenguo Li,et al. Modeling Scene and Object Contexts for Human Action Retrieval With Few Examples , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Michael Werman,et al. A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[10] Luc Van Gool,et al. Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[11] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[13] Darwin G. Caldwell,et al. Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[15] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[17] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[18] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Vladimir Kolmogorov,et al. Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Jiebo Luo,et al. Recognizing realistic actions from videos , 2009, CVPR.

[22] Thorsten Joachims,et al. Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[23] Eren Erdal Aksoy,et al. Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[24] J.K. Aggarwal,et al. Human activity analysis , 2011, ACM Comput. Surv..

[25] Nico Blodow,et al. Towards 3D object maps for autonomous household robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26] Pierre Hansen,et al. Roof duality, complementation and persistency in quadratic 0–1 optimization , 1984, Math. Program..

[27] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Li Wang,et al. Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models , 2011, International Journal of Computer Vision.

[29] Fernando De la Torre,et al. Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[30] 智一吉田,et al. Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[31] Martial Hebert,et al. Model recommendation for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33] Danica Kragic,et al. Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[34] Ben Taskar,et al. Learning associative Markov networks , 2004, ICML.

[35] Yang Wang,et al. Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Dieter Fox,et al. Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.