Estimating Motion Codes from Demonstration Videos

A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-relevant features, and their distances can be more reasonably measured using these motion features. In this paper, we develop a deep learning pipeline to extract motion codes from demonstration videos in an unsupervised manner so that knowledge from these videos can be properly represented and used for robots. Our evaluations show that motion codes can be extracted from demonstrations of action in the EPIC-KITCHENS dataset.

[1]  Sonia Chernova,et al.  RoboCSE: Robot Common Sense Embedding , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Wei Dai,et al.  Functional analysis of grasping motion , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Dima Damen,et al.  Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.

[4]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[5]  Katsushi Ikeuchi,et al.  Recognizing Assembly Tasks Through Human Demonstration , 2007, Int. J. Robotics Res..

[6]  Yi Li,et al.  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[7]  Rüdiger Dillmann,et al.  Teaching and learning of robot tasks via observation of human performance , 2004, Robotics Auton. Syst..

[8]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Yu Sun,et al.  Learning to pour , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Silvio Savarese,et al.  Demo2Vec: Reasoning Object Affordances from Online Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  David Paulius,et al.  A Motion Taxonomy for Manipulation Embedding , 2020, Robotics: Science and Systems.

[12]  Yu Sun,et al.  Functional object-oriented network for manipulation learning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  David Paulius,et al.  A Survey of Knowledge Representation in Service Robotics , 2018, Robotics Auton. Syst..

[14]  Tadahiro Taniguchi,et al.  Evaluation of Word Representations in Grounding Natural Language Instructions Through Computational Human-Robot Interaction , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Yu Sun,et al.  A dataset of daily interactive manipulation , 2018, Int. J. Robotics Res..

[16]  Yu Sun,et al.  Accurate Pouring using Model Predictive Control Enabled by Recurrent Neural Network , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  David Paulius,et al.  Manipulation Motion Taxonomy and Coding for Robots , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Yu Sun,et al.  Long Activity Video Understanding Using Functional Object-Oriented Network , 2018, IEEE Transactions on Multimedia.

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yu Sun,et al.  Functional Object-Oriented Network: Construction & Expansion , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).