Bayes-adaptive hierarchical MDPs
暂无分享,去创建一个
[1] Ngo Anh Vien,et al. Touch based POMDP manipulation via sequential submodular optimization , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).
[2] Marc Toussaint,et al. POMDP manipulation via trajectory optimization , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[3] Marc Toussaint,et al. Hierarchical Monte-Carlo Planning , 2015, AAAI.
[4] Nina Dethlefs,et al. Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots , 2014, TIIS.
[5] Ngo Anh Vien,et al. Approximate planning for bayesian hierarchical reinforcement learning , 2014, Applied Intelligence.
[6] Marc Toussaint,et al. Model-Based Relational RL When Object Existence is Partially Observable , 2014, ICML.
[7] Wolfgang Ertel,et al. Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.
[8] Nicholas Roy,et al. Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..
[9] TaeChoong Chung,et al. Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.
[10] Wolfgang Ertel,et al. Monte-Carlo tree search for Bayesian reinforcement learning , 2012, Applied Intelligence.
[11] Feng Cao,et al. Bayesian Hierarchical Reinforcement Learning , 2012, NIPS.
[12] David Hsu,et al. Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.
[13] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[14] David Hsu,et al. Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.
[15] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[16] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[17] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[18] Marc Toussaint,et al. Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..
[19] Nicholas Roy,et al. PUMA: Planning Under Uncertainty with Macro-Actions , 2010, AAAI.
[20] TaeChoong Chung,et al. Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors , 2010, IEICE Trans. Inf. Syst..
[21] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[22] Paloma Martínez,et al. Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.
[23] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[24] TaeChoong Chung,et al. Policy Gradient Semi-markov Decision Process , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.
[25] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[26] Nguyen Hoang Viet,et al. Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.
[27] Nguyen Hoang Viet,et al. Obstacle Avoidance Path Planning for Mobile Robot Based on Multi Colony Ant Algorithm , 2008, First International Conference on Advances in Computer-Human Interaction.
[28] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[29] TaeChoong Chung,et al. Natural Gradient Policy for Average Cost SMDP Problem , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).
[30] Caro Lucas,et al. A Dynamic Fuzzy-Based Crossover Method for Genetic Algorithms , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).
[31] Nguyen Hoang Viet,et al. Heuristic Search Based Exploration in Reinforcement Learning , 2007, IWANN.
[32] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[33] Nguyen Hoang Viet,et al. Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm , 2007, ISNN.
[34] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[35] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[36] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[37] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[38] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[39] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[40] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[41] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.
[42] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[43] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[44] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[45] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[46] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[47] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[48] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[49] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .
[50] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[51] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[52] Christopher G. Atkeson,et al. Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.
[53] Satinder Singh,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[54] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[55] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[56] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[57] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[58] Chelsea C. White,et al. Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..
[59] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[60] Stephen W. Carden,et al. An Introduction to Reinforcement Learning , 2013 .
[61] David Hsu,et al. Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.
[62] Nguyen Hoang Viet,et al. Q-Learning based Univector Field Navigation Method for Mobile Robots , 2007 .
[63] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[64] Vittaldas V. Prabhu,et al. Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems , 2004, Applied Intelligence.
[65] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[66] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[67] S. Thrun,et al. An integrated approach to hierarchy and abstraction for POMDPs , 2002 .