Face valuing: Training user interfaces with facial expressions and reinforcement learning

An important application of interactive machine learning is extending or amplifying the cognitive and physical capabilities of a human. To accomplish this, machines need to learn about their human users' intentions and adapt to their preferences. In most current research, a user has conveyed preferences to a machine using explicit corrective or instructive feedback; explicit feedback imposes a cognitive load on the user and is expensive in terms of human effort. The primary objective of the current work is to demonstrate that a learning agent can reduce the amount of explicit feedback required for adapting to the user's preferences pertaining to a task by learning to perceive a value of its behavior from the human user, particularly from the user's facial expressions---we call this face valuing. We empirically evaluate face valuing on a grip selection task. Our preliminary results suggest that an agent can quickly adapt to a user's changing preferences with minimal explicit feedback by learning a value function that maps facial features extracted from a camera image to expected future reward. We believe that an agent learning to perceive a value from the body language of its human user is complementary to existing interactive machine learning approaches and will help in creating successful human-machine interactive applications.

[1]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  C. Breazeal Regulating Human-Robot Interaction using “ emotions ” , “ drives ” and facial expressions , 1998 .

[4]  Rosalind W. Picard,et al.  Subtle Expressivity in a Robotic Computer , 2003 .

[5]  Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[6]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[7]  Ian R. Fasel,et al.  Design Principles for Creating Human-Shapable Agents , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[8]  T Walley Williams,et al.  Progress on stabilizing and controlling powered upper-limb prostheses. , 2011, Journal of rehabilitation research and development.

[9]  Patrick M. Pilarski,et al.  Between Instruction and Reward: Human-Prompted Switching , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[10]  W. Bradley Knox,et al.  Learning from human-generated reward , 2012 .

[11]  Maja J. Mataric,et al.  Training Wheels for the Robot: Learning from Demonstration Using Simulation , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[12]  Pieter Abbeel,et al.  Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.

[13]  Cynthia Breazeal,et al.  Teaching agents with human feedback: a demonstration of the TAMER framework , 2013, IUI '13 Companion.

[14]  Maja J. Mataric,et al.  Automated Proxemic Feature Extraction and Behavior Recognition: Applications in Human-Robot Interaction , 2013, Int. J. Soc. Robotics.

[15]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Darwin G. Caldwell,et al.  Learning from demonstrations with partially observable task parameters , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  P. Pilarski Prosthetic Devices as Goal-Seeking Agents , 2015 .

[18]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[19]  Peter Stone,et al.  Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[20]  David L. Roberts,et al.  A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[21]  Craig Sherstan,et al.  Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching , 2016, Prosthetics and orthotics international.