论文信息 - Modeling Affect-based Intrinsic Rewards for Exploration and Learning

Modeling Affect-based Intrinsic Rewards for Exploration and Learning

Positive affect has been linked to increased interest, curiosity and satisfaction in human learning. In reinforcement learning, extrinsic rewards are often sparse and difficult to define, intrinsically motivated learning can help address these challenges. We argue that positive affect is an important intrinsic reward that effectively helps drive exploration that is useful in gathering experiences. We present a novel approach leveraging a task-independent reward function trained on spontaneous smile behavior that reflects the intrinsic reward of positive affect. To evaluate our approach we trained several downstream computer vision tasks on data collected with our policy and several baseline methods. We show that the policy based on our affective rewards successfully increases the duration of episodes, the area explored and reduces collisions. The impact is the increased speed of learning for several downstream computer vision tasks.

Daniel McDuff | Ashish Kapoor | Dean Zadok

[1] Cewu Lu,et al. Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[2] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[3] Alexey Dosovitskiy,et al. End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4] James Hays,et al. SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Aude Billard,et al. Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[6] Shi-Min Hu,et al. Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[7] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[8] Daniele Loiacono,et al. Learning drivers for TORCS through imitation using supervised methods , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[9] Natasha Jaques,et al. Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback , 2018, AffComp@IJCAI.

[10] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] K. Kassam. Assessment of emotional experience through facial expression , 2010 .

[12] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[13] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[14] Marc Alexa,et al. How do humans sketch objects? , 2012, ACM Trans. Graph..

[15] M. Ford,et al. Affective States, Expressive Behavior, and Learning in Children. , 1979 .

[16] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[17] P. Ekman,et al. Felt, false, and miserable smiles , 1982 .

[18] Douglas Eck,et al. Learning via social awareness: improving sketch representations with facial feedback , 2018, ICLR.

[19] Jürgen Leitner,et al. Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[20] Daniel McDuff,et al. Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards , 2018, ICLR.

[21] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.

[22] G. Schwartz,et al. Relationships between facial electromyography and subjective experience during affective imagery , 1980, Biological Psychology.

[23] Rosalind W. Picard,et al. An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion , 2001, Proceedings IEEE International Conference on Advanced Learning Technologies.

[24] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[25] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[27] Marianne LaFrance,et al. Lip Service: Smiles in Life, Death, Trust, Lies, Work, Memory, Sex, and Politics , 2011 .

[28] Eric Kolstad,et al. Egocentric depth judgments in optical, see-through augmented reality , 2007, IEEE Transactions on Visualization and Computer Graphics.

[29] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[30] Louis-Philippe Morency,et al. OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[31] Ross A. Knepper,et al. Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning , 2018, Robotics: Science and Systems.

[32] Ashish Kapoor,et al. Explorations and Lessons Learned in Building an Autonomous Formula SAE Car from Simulations , 2019, SIMULTECH.

[33] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[34] Philippe G. Schyns,et al. Functional Smiles: Tools for Love, Sympathy, and War , 2017, Psychological science.

[35] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36] Trevor Darrell,et al. Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[38] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[39] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[40] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Maja Pantic,et al. Automatic Analysis of Facial Actions: A Survey , 2019, IEEE Transactions on Affective Computing.

[42] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[43] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Ashish Kapoor,et al. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.