Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion
暂无分享,去创建一个
[1] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[2] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[3] Tingwen Huang,et al. Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.
[4] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[5] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[6] José Carlos Príncipe,et al. A model based approach to exploration of continuous-state MDPs using Divergence-to-Go , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).
[7] José Carlos Príncipe,et al. Balancing exploration and exploitation in reinforcement learning using a value of information criterion , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[9] Geoffrey C. Fox,et al. Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.
[10] Frank L. Lewis,et al. Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[11] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[12] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Huaguang Zhang,et al. Adaptive Fault-Tolerant Tracking Control for MIMO Discrete-Time Systems via Reinforcement Learning Algorithm With Less Learning Parameters , 2017, IEEE Transactions on Automation Science and Engineering.
[15] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[16] Lawrence K. Saul,et al. Learning curve bounds for a Markov decision process with undiscounted rewards , 1996, COLT '96.
[17] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[18] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[19] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[20] José Carlos Príncipe,et al. Analysis of Agent Expertise in Ms. Pac-Man Using Value-of-Information-Based Policies , 2017, IEEE Transactions on Games.
[21] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[22] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[23] Nicky J Welton,et al. Value of Information , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.
[24] Deniz Erdogmus,et al. Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.
[25] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[26] Huaguang Zhang,et al. Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems With Unknown Uncertainty Using Adaptive Critic Design , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[27] Panos M. Pardalos,et al. Reinforcement Learning in Video Games Using Nearest Neighbor Interpolation and Metric Learning , 2016, IEEE Transactions on Computational Intelligence and AI in Games.
[28] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[29] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[30] KearnsMichael,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002 .
[31] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[32] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[33] Ronen I. Brafman,et al. A near-optimal polynomial time algorithm for learning in certain classes of stochastic games , 2000, Artif. Intell..
[34] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[35] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[36] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[37] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[38] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[39] José Carlos Príncipe,et al. Partitioning Relational Matrices of Similarities or Dissimilarities Using the Value of Information , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Dongbin Zhao,et al. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[41] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[42] Chong Li,et al. Model-Free Reinforcement Learning , 2019, Reinforcement Learning for Cyber-Physical Systems.
[43] José Carlos Príncipe,et al. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits , 2017, Entropy.
[44] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.