论文信息 - Learning individual motion preferences from audience feedback of motion sequences

Learning individual motion preferences from audience feedback of motion sequences

A robot performs a sequence of motions to animate a given input, e.g., dancing to music or telling a story. Each input is pre-processed to determine labels, e.g., emotions of the music or words in the story. Each label corresponds to multiple motions, and each motion has multiple labels. Therefore, the robot can choose one sequence from multiple motion sequences to animate the input. We aim to choose the best sequence to animate based on the audience's preferences. The audience prefers some motions over others, and each motion has an initially unknown preference value. At the end of the motion sequence, the audience provides feedback which is the sum of the motions' preference values. However, the observation of the feedback is noisy due to the device used to capture the audience's feedback. To select the most preferred sequence, the robot has to determine the sequence to query the audience with, so as to learn the preference values of individual motions from noisy observations of the audience's feedback. By learning the individual motion preference values, the most preferred sequence can be determined. Moreover, the audience may get bored of watching the same single motion in multiple sequences and the preference value will degrade based on the number of times the motion is viewed. We contribute MAK (Multi-Armed bandit and Kalman filter) and show that MAK outperforms least squares regression in selecting the best sequence with lower degradation in our simulation experiments.

Manuela M. Veloso | Junyun Tay | I-Ming Chen

[1] Sheikh Iqbal Ahamed,et al. Applying affective feedback to reinforcement learning in ZOEI, a comic humanoid robot , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[2] A Savvy Robot Standup Comic: Online Learning through Audience Tracking , 2015 .

[3] Manuela M. Veloso,et al. Modeling and composing gestures for human-robot interaction , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[4] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[5] M. Veloso,et al. Allocating Training Instances to Learning Agents that Improve Coordination for Team Formation , 2014 .

[6] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[7] Manuela M. Veloso,et al. Autonomous robot dancing driven by beats and emotions of music , 2012, AAMAS.

[8] Manuela M. Veloso,et al. Team formation with learning agents that improve coordination , 2014, AAMAS.

[9] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .