Representing, learning, and controlling complex object interactions

We present a framework for representing scenarios with complex object interactions, where a robot cannot directly interact with the object it wishes to control and must instead influence it via intermediate objects. For instance, a robot learning to drive a car can only change the car’s pose indirectly via the steering wheel, and must represent and reason about the relationship between its own grippers and the steering wheel, and the relationship between the steering wheel and the car. We formalize these interactions as chains and graphs of Markov decision processes (MDPs) and show how such models can be learned from data. We also consider how they can be controlled given known or learned dynamics. We show that our complex model can be collapsed into a single MDP and solved to find an optimal policy for the combined system. Since the resulting MDP may be very large, we also introduce a planning algorithm that efficiently produces a potentially suboptimal policy. We apply these models to two systems in which a robot uses learning from demonstration to achieve indirect control: playing a computer game using a joystick, and using a hot water dispenser to heat a cup of water.

[1]  Danica Kragic,et al.  Robot Learning from Demonstration: A Task-level Planning Approach , 2008 .

[2]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[3]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Yilun Zhou,et al.  Representing and Learning Complex Object Interactions , 2016, Robotics: Science and Systems.

[5]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[6]  Jan Peters,et al.  Using Bayesian Dynamical Systems for Motion Template Libraries , 2008, NIPS.

[7]  Scott Niekum,et al.  Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[8]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[9]  Dana Kulic,et al.  Online Segmentation and Clustering From Continuous Observation of Whole Body Motions , 2009, IEEE Transactions on Robotics.

[10]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[11]  Danica Kragic,et al.  Learning Actions from Observations , 2010, IEEE Robotics & Automation Magazine.

[12]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13]  Jan Peters,et al.  Movement extraction by detecting dynamics switches and repetitions , 2010, NIPS.

[14]  Odest Chadwicke Jenkins,et al.  Learning from demonstration using a multi-valued function regressor for time-series data , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[15]  Daniel H. Grollman,et al.  Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Kurt Konolige,et al.  Autonomous door opening and plugging in with a personal robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[18]  NakamuraYoshihiko,et al.  Online segmentation and clustering from continuous observation of whole body motions , 2009 .

[19]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[20]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[21]  Leslie Pack Kaelbling,et al.  Constructing Symbolic Representations for High-Level Planning , 2014, AAAI.