SVQN: Sequential Variational Soft Q-Learning Networks
暂无分享,去创建一个
Hang Su | Ting Chen | Jun Zhu | Shiyu Huang | Jun Zhu | Hang Su | Shiyu Huang | Tingling Chen
[1] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[2] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[3] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[4] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.
[5] David Hsu,et al. QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.
[8] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.
[9] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.
[10] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[11] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.
[12] Pascal Poupart,et al. On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.
[13] Maxim Egorov,et al. Deep Reinforcement Learning with POMDPs , 2015 .
[14] Tong Lu,et al. On Reinforcement Learning for Full-length Game of StarCraft , 2018, AAAI.
[15] S. Gershman,et al. Belief state representation in the dopamine system , 2018, Nature Communications.
[16] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[17] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[18] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[19] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[20] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[21] Tuan Anh Le,et al. Auto-Encoding Sequential Monte Carlo , 2017, ICLR.
[22] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[23] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[25] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[26] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[27] Ting Chen,et al. Combo-Action: Training Agent For FPS Game with Auxiliary Tasks , 2019, AAAI.
[28] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[29] Shimon Whiteson,et al. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.
[30] Honglak Lee,et al. Contingency-Aware Exploration in Reinforcement Learning , 2018, ICLR.
[31] Rémi Munos,et al. Particle Filter-based Policy Gradient in POMDPs , 2008, NIPS.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.