Bayes-adaptive hierarchical MDPs
暂无分享,去创建一个
[1] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[2] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[3] Marc Toussaint,et al. Hierarchical Monte-Carlo Planning , 2015, AAAI.
[4] Wolfgang Ertel,et al. Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.
[5] Vittaldas V. Prabhu,et al. Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems , 2004, Applied Intelligence.
[6] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[7] Marc Toussaint,et al. POMDP manipulation via trajectory optimization , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[8] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[9] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[10] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[11] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Marc Toussaint,et al. Model-Based Relational RL When Object Existence is Partially Observable , 2014, ICML.
[14] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[15] Nina Dethlefs,et al. Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots , 2014, TIIS.
[16] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .
[17] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[18] Nguyen Hoang Viet,et al. Heuristic Search Based Exploration in Reinforcement Learning , 2007, IWANN.
[19] TaeChoong Chung,et al. Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors , 2010, IEICE Trans. Inf. Syst..
[20] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[21] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[22] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[23] Christopher G. Atkeson,et al. Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.
[24] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Chelsea C. White,et al. Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..
[27] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[28] Caro Lucas,et al. A Dynamic Fuzzy-Based Crossover Method for Genetic Algorithms , 2007 .
[29] Wolfgang Ertel,et al. Monte-Carlo tree search for Bayesian reinforcement learning , 2012, Applied Intelligence.
[30] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[31] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[32] David Hsu,et al. Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.
[33] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[34] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[35] Sungyoung Lee,et al. Approximate planning for bayesian hierarchical reinforcement learning , 2014, Applied Intelligence.
[36] Nguyen Hoang Viet,et al. Obstacle Avoidance Path Planning for Mobile Robot Based on Multi Colony Ant Algorithm , 2008, First International Conference on Advances in Computer-Human Interaction.
[37] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[38] Paloma Martínez,et al. Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.
[39] Marc Toussaint,et al. Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..
[40] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[41] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[42] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[43] David Hsu,et al. Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.
[44] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[45] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[46] Nicholas Roy,et al. Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..
[47] Feng Cao,et al. Bayesian Hierarchical Reinforcement Learning , 2012, NIPS.
[48] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[49] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[50] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[51] Joelle Pineau,et al. An integrated approach to hierarchy and abstraction for pomdps , 2002 .
[52] TaeChoong Chung,et al. Policy Gradient Semi-markov Decision Process , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.
[53] David Hsu,et al. Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.
[54] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[55] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[56] Nguyen Hoang Viet,et al. Q-Learning based Univector Field Navigation Method for Mobile Robots , 2007 .
[57] TaeChoong Chung,et al. Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.
[58] TaeChoong Chung,et al. Natural Gradient Policy for Average Cost SMDP Problem , 2007 .
[59] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[60] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[61] Nguyen Hoang Viet,et al. Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm , 2007, ISNN.
[62] Nicholas Roy,et al. PUMA: Planning Under Uncertainty with Macro-Actions , 2010, AAAI.
[63] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[64] Nguyen Hoang Viet,et al. Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.
[65] Ngo Anh Vien,et al. Touch based POMDP manipulation via sequential submodular optimization , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).
[66] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.
[67] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.