Affect-based Intrinsic Rewards for Learning General Representations

Positive affect has been linked to increased interest, curiosity and satisfaction in human learning. In reinforcement learning, extrinsic rewards are often sparse and difficult to define, intrinsically motivated learning can help address these challenges. We argue that positive affect is an important intrinsic reward that effectively helps drive exploration that is useful in gathering experiences. We present a novel approach leveraging a task-independent intrinsic reward function trained on spontaneous smile behavior that captures positive affect. To evaluate our approach we trained several downstream computer vision tasks on data collected with our policy and several baseline methods. We show that the policy based on intrinsic affective rewards successfully increases the duration of episodes, the area explored and reduces collisions. The impact is the increased speed of learning for several downstream computer vision tasks.

[1]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[2]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[3]  K. Kassam Assessment of emotional experience through facial expression , 2010 .

[4]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  M. Ford,et al.  Affective States, Expressive Behavior, and Learning in Children. , 1979 .

[6]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[7]  Aude Billard,et al.  Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[8]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[9]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[10]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[11]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[12]  Jürgen Leitner,et al.  Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[13]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Philippe G. Schyns,et al.  Functional Smiles: Tools for Love, Sympathy, and War , 2017, Psychological science.

[17]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Maja Pantic,et al.  Automatic Analysis of Facial Actions: A Survey , 2019, IEEE Transactions on Affective Computing.

[21]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[22]  Ashish Kapoor,et al.  Explorations and Lessons Learned in Building an Autonomous Formula SAE Car from Simulations , 2019, SIMULTECH.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Trevor Darrell,et al.  Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[26]  Douglas Eck,et al.  Learning via social awareness: improving sketch representations with facial feedback , 2018, ICLR.

[27]  Cewu Lu,et al.  Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[28]  Rosalind W. Picard,et al.  An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion , 2001, Proceedings IEEE International Conference on Advanced Learning Technologies.

[29]  Daniele Loiacono,et al.  Learning drivers for TORCS through imitation using supervised methods , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[32]  Natasha Jaques,et al.  Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback , 2018, AffComp@IJCAI.

[33]  Ross A. Knepper,et al.  Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning , 2018, Robotics: Science and Systems.

[34]  G. Schwartz,et al.  Relationships between facial electromyography and subjective experience during affective imagery , 1980, Biological Psychology.

[35]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[36]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[37]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[38]  P. Ekman,et al.  Felt, false, and miserable smiles , 1982 .

[39]  Eric Kolstad,et al.  Egocentric depth judgments in optical, see-through augmented reality , 2007, IEEE Transactions on Visualization and Computer Graphics.

[40]  Marianne LaFrance,et al.  Lip Service: Smiles in Life, Death, Trust, Lies, Work, Memory, Sex, and Politics , 2011 .

[41]  Daniel McDuff,et al.  Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards , 2018, ICLR.

[42]  James Hays,et al.  SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[44]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[45]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..