An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies
暂无分享,去创建一个
[1] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] V. Climenhaga. Markov chains and mixing times , 2013 .
[4] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[5] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[6] Peter Auer,et al. Autonomous Exploration For Navigating In MDPs , 2012, COLT.
[7] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.
[8] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[10] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[11] J. Hunter. Some stochastic properties of semi-magic and magic Markov chains , 2010 .
[12] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.
[13] Alexandros G. Dimakis,et al. Optimized Markov Chain Monte Carlo for Signal Detection in MIMO Systems: An Analysis of the Stationary Distribution and Mixing Time , 2013, IEEE Transactions on Signal Processing.
[14] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[15] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[16] Andrea Bonarini,et al. Incremental Skill Acquisition for Self-motivated Learning Animats , 2006, SAB.
[17] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[19] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[20] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[21] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[22] S. Kirkland. COLUMN SUMS AND THE CONDITIONING OF THE STATIONARY DISTRIBUTION FOR A STOCHASTIC MATRIX , 2010 .
[23] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[24] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[25] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[26] Imre Csiszár,et al. Context tree estimation for not necessarily finite memory processes, via BIC and MDL , 2005, IEEE Transactions on Information Theory.
[27] Stephen P. Boyd,et al. Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Alessandro Lazaric,et al. Active Exploration in Markov Decision Processes , 2019, AISTATS.
[31] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[32] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[33] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[34] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[35] Martin Grötschel,et al. The Ellipsoid Method , 1993 .
[36] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[37] David Barber,et al. Variational methods for Reinforcement Learning , 2010, AISTATS.
[38] Pierre-Yves Oudeyer,et al. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.
[39] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[40] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[41] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[42] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[43] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[44] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.