Affect-based Intrinsic Rewards for Learning General Representations

Positive affect has been linked to increased interest, curiosity and satisfaction in human learning. In reinforcement learning extrinsic rewards are often sparse and difficult to define, intrinsically motivated learning can help address these challenges. We argue that positive affect is an important intrinsic reward that effectively helps drive exploration that is useful in gathering experiences critical to learning general representations. We present a novel approach leveraging a task-independent intrinsic reward function trained on spontaneous smile behavior that captures positive affect. To evaluate our approach we trained several downstream computer vision tasks on data collected with our policy and several baseline methods. We show that the policy based on intrinsic affective rewards successfully increases the duration of episodes, area explored and reduces collisions. The impact is increased speed of learning for several downstream computer vision tasks.

[1]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[2]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[3]  Philippe G. Schyns,et al.  Functional Smiles: Tools for Love, Sympathy, and War , 2017, Psychological science.

[4]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Daniele Loiacono,et al.  Learning drivers for TORCS through imitation using supervised methods , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[6]  G. Schwartz,et al.  Relationships between facial electromyography and subjective experience during affective imagery , 1980, Biological Psychology.

[7]  James Hays,et al.  SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Daniel McDuff,et al.  Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards , 2018, ICLR.

[9]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Ashish Kapoor,et al.  Explorations and Lessons Learned in Building an Autonomous Formula SAE Car from Simulations , 2019, SIMULTECH.

[11]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[12]  Jürgen Leitner,et al.  Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[13]  Cewu Lu,et al.  Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[14]  Rosalind W. Picard,et al.  An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion , 2001, Proceedings IEEE International Conference on Advanced Learning Technologies.

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Trevor Darrell,et al.  Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[18]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[19]  Ulrich Amsel Lip Service Smiles In Life Death Trust Lies Work Memory Sex And Politics , 2016 .

[20]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[21]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[22]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  M. Ford,et al.  Affective States, Expressive Behavior, and Learning in Children. , 1979 .

[24]  Maja Pantic,et al.  Automatic Analysis of Facial Actions: A Survey , 2019, IEEE Transactions on Affective Computing.

[25]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[26]  K. Kassam Assessment of emotional experience through facial expression , 2010 .

[27]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[28]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[29]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30]  Aude Billard,et al.  Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[31]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[34]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[37]  Natasha Jaques,et al.  Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback , 2018, AffComp@IJCAI.

[38]  Ross A. Knepper,et al.  Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning , 2018, Robotics: Science and Systems.

[39]  P. Ekman,et al.  Felt, false, and miserable smiles , 1982 .

[40]  Eric Kolstad,et al.  Egocentric depth judgments in optical, see-through augmented reality , 2007, IEEE Transactions on Visualization and Computer Graphics.

[41]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[42]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.