Cost Functions for Robot Motion Style

We focus on autonomously generating robot motion for day to day physical tasks that is expressive of a certain style or emotion. Because we seek generalization across task instances and task types, we propose to capture style via cost functions that the robot can use to augment its nominal task cost and task constraints in a trajectory optimization process. We compare two approaches to representing such cost functions: a weighted linear combination of hand-designed features, and a neural network parameterization operating on raw trajectory input. For each cost type, we learn weights for each style from user feedback. We contrast these approaches to a nominal motion across different tasks and for different styles in a user study, and find that they both perform on par with each other, and significantly outperform the baseline. Each approach has its advantages: featurized costs require learning fewer parameters and can perform better on some styles, but neural network representations do not require expert knowledge to design features and could even learn more complex, nuanced costs than an expert can easily design.

[1]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[2]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[3]  James Everett Young,et al.  Communicating affect via flight path Exploring use of the Laban Effort System for designing affective locomotion paths , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[5]  Bilge Mutlu,et al.  Communication of Intent in Assistive Free Flyers , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[7]  Niloy J. Mitra,et al.  Spectral style transfer for human motion between independent actions , 2016, ACM Trans. Graph..

[8]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[9]  Lorenzo Torresani,et al.  Learning Motion Style Synthesis from Perceptual Observations , 2006, NIPS.

[10]  Z. Popovic,et al.  Learning behavior styles with inverse reinforcement learning , 2010, ACM Trans. Graph..

[11]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[13]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[14]  C. K. Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, SIGGRAPH 2005.

[15]  Reid Simmons,et al.  Expressive motion with x, y and theta: Laban Effort Features for mobile robots , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[16]  Andrea Lockerd Thomaz,et al.  Generating anticipation in robot motion , 2011, 2011 RO-MAN.

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[19]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[20]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[21]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[22]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[23]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[24]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.