Crossmodal Attentive Skill Learner

This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[4]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[7]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[8]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[9]  A. Nakamura,et al.  Nature (London , 1975 .

[10]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[11]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[12]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[13]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[14]  Fabio Tozeto Ramos,et al.  Online learning for scene segmentation with laser-constrained CRFs , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[17]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[18]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[21]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[22]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[23]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[24]  Li Fei-Fei,et al.  Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.

[25]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[26]  Roland Siegwart,et al.  A robust and modular multi-sensor fusion approach applied to MAV navigation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[28]  N. Mackintosh,et al.  Two theories of attention: a review and a possible integration , 2010 .

[29]  Fethi Bougares,et al.  Multimodal Attention for Neural Machine Translation , 2016, ArXiv.

[30]  M. Carrasco Visual attention: The past 25 years , 2011, Vision Research.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[33]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[36]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[37]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[40]  Simona Nobili,et al.  Heterogeneous Sensor Fusion for Accurate State Estimation of Dynamic Legged Robots , 2017, Robotics: Science and Systems.

[41]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42]  Jana Kosecka,et al.  Semantic segmentation with heterogeneous sensor coverages , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[44]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[45]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[46]  Sebastian Scherer,et al.  Robust multi-sensor fusion for micro aerial vehicle navigation in GPS-degraded/denied environments , 2014, 2014 American Control Conference.

[47]  Nebojsa Jojic,et al.  Audio-Video Sensor Fusion with Probabilistic Graphical Models , 2002, ECCV.

[48]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.