Probabilistic semantic models for manipulation action representation and extraction

In this paper we present a hierarchical framework for representation of manipulation actions and its applicability to the problem of top down action extraction from observation. The framework consists of novel probabilistic semantic models, which encode contact relations as probability distributions over the action phase. The models are action descriptive and can be used to provide probabilistic similarity scores for newly observed action sequences. The lower level of the representation consists of parametric hidden Markov models, which encode trajectory information. We present a framework for analysis of manipulation action primitives.The semantics of actions are modeled as probabilistic distributions of semantic events over action phase.The resulting probabilistic models are used in combination with lower level trajectories.The approach is evaluated on a problem of extracting action primitives from observation.

[1]  Danica Kragic,et al.  Learning Actions from Observations , 2010, IEEE Robotics & Automation Magazine.

[2]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Christopher W. Geib,et al.  Structural bootstrapping at the sensorimotor level for the fast acquisition of action knowledge for cognitive robots , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[4]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[6]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[7]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[8]  Ales Ude,et al.  Motion imitation and recognition using parametric hidden Markov models , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[9]  Gérard G. Medioni,et al.  Dynamic Manifold Warping for view invariant action recognition , 2011, 2011 International Conference on Computer Vision.

[10]  Guoliang Luo,et al.  Representing actions with kernels , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Dan Schonfeld,et al.  Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[12]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[13]  Ales Ude,et al.  A Simple Ontology of Manipulation Actions Based on Hand-Object Relations , 2013, IEEE Transactions on Autonomous Mental Development.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  Oliver Brock,et al.  Motion for Manipulation Tasks , 2008, Springer Handbook of Robotics, 2nd Ed..

[16]  Adolfo López,et al.  Model-based recognition of human actions by trajectory matching in phase spaces , 2012, Image Vis. Comput..

[17]  Gerhard Rigoll,et al.  Hidden Markov model based continuous online gesture recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[18]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Jing Xiao,et al.  Motion for Manipulation Tasks , 2016, Springer Handbook of Robotics, 2nd Ed..

[20]  Katsushi Ikeuchi,et al.  Toward automatic robot instruction from perception-temporal segmentation of tasks from human hand motion , 1993, IEEE Trans. Robotics Autom..

[21]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[22]  Jing Xiao,et al.  Automatic determination of topological contacts in the presence of sensing uncertainties , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[23]  Marius-Calin Silaghi,et al.  Spotting Subsequences Matching an HMM Using the Average Observation Probability Criteria with Application to Keyword Spotting , 2005, AAAI.

[24]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[25]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[26]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  MetaxasDimitris,et al.  Conditional models for contextual human motion recognition , 2006 .

[28]  Mei-Yuh Hwang,et al.  Speech recognition using hidden Markov models: A CMU perspective , 1990, Speech Communication.

[29]  Yiannis Aloimonos,et al.  Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Tomás Lozano-Pérez,et al.  Spatial Planning: A Configuration Space Approach , 1983, IEEE Transactions on Computers.

[31]  Oliver Kroemer,et al.  Learning grasp affordance densities , 2011, Paladyn J. Behav. Robotics.

[32]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[33]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[34]  Atsushi Nakamura,et al.  Speech Recognition using Hidden Markov Models , 1998 .

[35]  Jing Xiao,et al.  Automatic Generation of High-Level Contact State Space , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[36]  Nick Barnes,et al.  A Simple and Practical Solution to the Rigid Body Motion Segmentation Problem Using a RGB-D Camera , 2011, 2011 International Conference on Digital Image Computing: Techniques and Applications.

[37]  Jun Morimoto,et al.  Integrating visual perception and manipulation for autonomous learning of object representations , 2013, Adapt. Behav..

[38]  Anthony G. Cohn,et al.  Learning Functional Object-Categories from a Relational Spatio-Temporal Representation , 2008, ECAI.

[39]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[40]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[41]  Jianying Hu,et al.  Writer independent on-line handwriting recognition using an HMM approach , 2000, Pattern Recognit..

[42]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[43]  Danica Kragic,et al.  Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects , 2008, ECCV.

[44]  Eren Erdal Aksoy,et al.  Segment Tracking via a Spatiotemporal Linking Process including Feedback Stabilization in an n-D Lattice Model , 2009, Sensors.

[45]  Xuerong Ji,et al.  Automatic Generation of High-Level Contact State Space , 2001, Int. J. Robotics Res..

[46]  K. Rathmill,et al.  The Development of a European Benchmark for the Comparison of Assembly Robot Programming Systems , 1985 .

[47]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.