Semantic constraints to represent common sense required in household actions for multi-modal Learning-from-observation robot

Recent years have witnessed an increasing demand of service robots that can assist the elderly. Many elderlies now reside in senior residences that face support staff shortages. Further, although their residences are comfortable and well supported, the elderly still desire to live in their own homes when possible. To fulfill such requirements of living in their own homes necessitate require further support. Therefore, it is important and imminent to develop service robots that support the lives of the elderly in senior residences and/or in their own homes to meet these requirements. The paradigm of learning-from-observation (LfO) is a promising direction toward this goal. An LfO system observes human actions and learns how to perform these actions via the observations. In senior residences and senior homes, we assume that nurses and care workers are stationed or visiting on a part-time basis. These novice users can teach and/or modify the robot’s action through the LfO system instead of manual programming. Further, even though each home has a large variation in the environment, if necessary, such care takers can tune up the robot actions through their demonstrations on-site to absorb the environmental variations. Thus far, most LfO systems have been developed for relatively clean environments such as machine assembly in industrial settings as done by [1] or rope handling in laboratory settings as done by [2]. The home environment is cluttered and household actions have wide variations that require common sense to understand and pursuit the actions. To overcome this clut-

[1]  Stefan Lee,et al.  Chasing Ghosts: Instruction Following as Bayesian State Tracking , 2019, NeurIPS.

[2]  Shaohua Yang,et al.  Language to Action: Towards Interactive Task Learning with Physical Agents , 2018, IJCAI.

[3]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Rui Liu,et al.  A review of methodologies for natural-language-facilitated human–robot cooperation , 2017, International Journal of Advanced Robotic Systems.

[6]  David Paulius,et al.  Manipulation Motion Taxonomy and Coding for Robots , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  A. Tucker,et al.  Linear Inequalities And Related Systems , 1956 .

[8]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[9]  Katsushi Ikeuchi,et al.  Recognizing Assembly Tasks Through Human Demonstration , 2007, Int. J. Robotics Res..

[10]  Stefanie Tellex,et al.  A natural language planner interface for mobile manipulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[12]  John E. Laird,et al.  Learning Grounded Language through Situated Interactive Instruction , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[13]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[14]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[15]  Michael Gleicher,et al.  Inferring geometric constraints in human demonstrations , 2018, CoRL.

[16]  Kimitoshi Yamazaki,et al.  Learning from Demonstration Based on a Mechanism to Utilize an Object’s Invisibility , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[18]  Medhat A. Moussa,et al.  Toward a Natural Language Interface for Transferring Grasping Skills to Robots , 2008, IEEE Transactions on Robotics.

[19]  Wendy A. Rogers,et al.  Domestic Robots for Older Adults: Attitudes, Preferences, and Potential , 2014, Int. J. Soc. Robotics.

[20]  Juan Pablo Wachs,et al.  Extending Policy from One-Shot Learning through Coaching , 2019, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[21]  Jingxuan Li,et al.  Learning Articulated Constraints From a One-Shot Demonstration for Robot Manipulation Planning , 2019, IEEE Access.

[22]  U. Rembold,et al.  KANTRA-human-machine interaction for intelligent robots using natural language , 1994, Proceedings of 1994 3rd IEEE International Workshop on Robot and Human Communication.

[23]  Kazuhiro Sasabuchi,et al.  Grasp-type Recognition Leveraging Object Affordance , 2020, ArXiv.

[24]  David Paulius,et al.  A Motion Taxonomy for Manipulation Embedding , 2020, Robotics: Science and Systems.

[25]  Maya Cakmak,et al.  Robot Programming by Demonstration with situated spatial language understanding , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Peter Ford Dominey,et al.  The Coordinating Role of Language in Real-Time Multimodal Learning of Cooperative Tasks , 2013, IEEE Transactions on Autonomous Mental Development.

[28]  Tamim Asfour,et al.  Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks , 2017, Robotics Auton. Syst..

[29]  C. Gauss,et al.  General investigations of curved surfaces , 1902 .

[30]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[31]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[32]  K. Ikeuchi,et al.  A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations , 2020, 2021 IEEE/SICE International Symposium on System Integration (SII).

[33]  Katsushi Ikeuchi,et al.  Task-Oriented Motion Mapping on Robots of Various Configuration Using Body Role Division , 2020, IEEE Robotics and Automation Letters.

[34]  Dominik Henrich,et al.  Control flow for robust one-shot robot programming using entity-based resources , 2017, 2017 18th International Conference on Advanced Robotics (ICAR).

[35]  Iori Yanokura,et al.  Verbal Focus-of-Attention System for Learning-from-Demonstration , 2020, ArXiv.

[36]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[37]  Masayuki Inaba,et al.  Understanding Action Sequences based on Video Captioning for Learning-from-Observation , 2021, ArXiv.

[38]  Bernt Schiele,et al.  Video Object Segmentation with Language Referring Expressions , 2018, ACCV.

[39]  Katsushi Ikeuchi,et al.  Representation for knot-tying tasks , 2006, IEEE Transactions on Robotics.

[40]  Julie A. Shah,et al.  C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.