Detection of Manipulation Action Consequences (MAC)

The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. We propose that a fundamental concept in understanding such actions, are the consequences of actions. There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. In this paper a technique is developed to recognize these action consequences. At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. We provide a new dataset, called Manipulation Action Consequences (MAC 1.0), which can serve as test bed for other studies on this topic. Several experiments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. Quantitative tests prove the effectiveness and efficiency of the method.

[1]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Antonis A. Argyros,et al.  Multiple objects tracking in the presence of long-term occlusions , 2010, Comput. Vis. Image Underst..

[3]  Chun Chen,et al.  Visual attention analysis by pseudo gravitational field , 2009, ACM Multimedia.

[4]  P. Kube Unbounded visual search is not both biologically plausible and NP - Complete , 1991, Behavioral and Brain Sciences.

[5]  Christian Laugier,et al.  The International Journal of Robotics Research (IJRR) - Special issue on ``Field and Service Robotics '' , 2009 .

[6]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[12]  G. Rizzolatti,et al.  Neurophysiological mechanisms underlying the understanding and imitation of action , 2001, Nature Reviews Neuroscience.

[13]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[15]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Christian Keysers,et al.  The anthropomorphic brain: The mirror neuron system responds to human and robotic actions , 2007, NeuroImage.

[17]  Rama Chellappa,et al.  Identification of humans using gait , 2004, IEEE Transactions on Image Processing.

[18]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  E. A. Locke,et al.  A theory of goal setting & task performance , 1990 .

[22]  Katja Nummiaro A Color-based Particle Filter , 2002 .

[23]  Yiannis Aloimonos,et al.  Active segmentation for robotics , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  HiltonAdrian,et al.  A survey of advances in vision-based human motion capture and analysis , 2006 .

[25]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[26]  Anthony G. Cohn,et al.  Learning Functional Object-Categories from a Relational Spatio-Temporal Representation , 2008, ECAI.

[27]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[29]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Danica Kragic,et al.  Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects , 2008, ECCV.

[32]  Bohyung Han,et al.  Visual Tracking by Continuous Density Propagation in Sequential Bayesian Filtering Framework , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Danica Kragic,et al.  Action recognition and understanding through motor primitives , 2007, Adv. Robotics.

[34]  Y. Aloimonos,et al.  Discovering a Language for Human Activity 1 , 2005 .

[35]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[37]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[39]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[40]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).