Learning individual motion preferences from audience feedback of motion sequences

A robot performs a sequence of motions to animate a given input, e.g., dancing to music or telling a story. Each input is pre-processed to determine labels, e.g., emotions of the music or words in the story. Each label corresponds to multiple motions, and each motion has multiple labels. Therefore, the robot can choose one sequence from multiple motion sequences to animate the input. We aim to choose the best sequence to animate based on the audience's preferences. The audience prefers some motions over others, and each motion has an initially unknown preference value. At the end of the motion sequence, the audience provides feedback which is the sum of the motions' preference values. However, the observation of the feedback is noisy due to the device used to capture the audience's feedback. To select the most preferred sequence, the robot has to determine the sequence to query the audience with, so as to learn the preference values of individual motions from noisy observations of the audience's feedback. By learning the individual motion preference values, the most preferred sequence can be determined. Moreover, the audience may get bored of watching the same single motion in multiple sequences and the preference value will degrade based on the number of times the motion is viewed. We contribute MAK (Multi-Armed bandit and Kalman filter) and show that MAK outperforms least squares regression in selecting the best sequence with lower degradation in our simulation experiments.