EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
暂无分享,去创建一个
[1] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[2] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[3] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[4] Alexei A. Efros,et al. Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.
[5] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[6] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[7] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[8] S. Shankar Sastry,et al. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.
[9] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[10] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[11] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[12] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[13] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[14] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[15] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[16] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[17] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[18] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[19] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[20] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[21] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[24] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[25] Fernando Diaz,et al. Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.
[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[28] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
[29] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[30] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[31] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[32] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[33] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.