Approximate planning for bayesian hierarchical reinforcement learning
暂无分享,去创建一个
Sungyoung Lee | TaeChoong Chung | Ngo Anh Vien | Hung Quoc Ngo | Sungyoung Lee | H. Ngo | TaeChoong Chung
[1] Jianghao Li,et al. Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm3 omni-directional mobile microrobot , 2011, Applied Intelligence.
[2] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[3] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[4] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[5] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[6] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[7] Vittaldas V. Prabhu,et al. Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems , 2004, Applied Intelligence.
[8] Nguyen Hoang Viet,et al. Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm , 2007, ISNN.
[9] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[10] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[11] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[12] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[13] Nicholas Roy,et al. PUMA: Planning Under Uncertainty with Macro-Actions , 2010, AAAI.
[14] Jürgen Schmidhuber,et al. Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots , 2013, Front. Psychol..
[15] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[16] Maziar Palhang,et al. Multi-criteria expertness based cooperative Q-learning , 2012, Applied Intelligence.
[17] Joelle Pineau,et al. Tractable planning under uncertainty: exploiting structure , 2004 .
[18] John R. Rose,et al. Robust multiagent plan generation and execution with decision theoretic planners , 2004 .
[19] Nguyen Hoang Viet,et al. Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks , 2009 .
[20] Wolfgang Ertel,et al. Monte-Carlo tree search for Bayesian reinforcement learning , 2012, Applied Intelligence.
[21] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[22] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[23] Bo Wu,et al. Point-based online value iteration algorithm in large POMDP , 2013, Applied Intelligence.
[24] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[25] Jürgen Schmidhuber,et al. Learning skills from play: Artificial curiosity on a Katana robot arm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[26] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[27] TaeChoong Chung,et al. Policy Gradient Semi-markov Decision Process , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.
[28] David Hsu,et al. Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Nguyen Hoang Viet,et al. Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.
[31] Chelsea C. White,et al. Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..
[32] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[33] Caro Lucas,et al. A Dynamic Fuzzy-Based Crossover Method for Genetic Algorithms , 2007 .
[34] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[35] Ole-Christoffer Granmo,et al. Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game , 2013, Applied Intelligence.
[36] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.
[37] David Hsu,et al. Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.
[38] Wolfgang Ertel,et al. Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.
[39] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[40] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[41] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[42] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[43] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[44] Joelle Pineau,et al. An integrated approach to hierarchy and abstraction for pomdps , 2002 .
[45] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[46] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.
[47] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[48] Feng Cao,et al. Bayesian Hierarchical Reinforcement Learning , 2012, NIPS.
[49] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[50] David Hsu,et al. Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.
[51] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[52] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[53] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[54] Nicholas Roy,et al. Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..
[55] TaeChoong Chung,et al. Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.
[56] Paloma Martínez,et al. Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.
[57] Christopher G. Atkeson,et al. Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.
[58] Shiliang Sun,et al. A review of deterministic approximate inference techniques for Bayesian machine learning , 2013, Neural Computing and Applications.
[59] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[60] Nguyen Hoang Viet,et al. Heuristic Search Based Exploration in Reinforcement Learning , 2007, IWANN.
[61] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[62] Monireh Abdoos,et al. Hierarchical control of traffic signals using Q-learning with tile coding , 2013, Applied Intelligence.
[63] TaeChoong Chung,et al. Natural Gradient Policy for Average Cost SMDP Problem , 2007 .
[64] Csaba Szepesv. Algorithms for Reinforcement Learning , 2010 .
[65] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[66] David Barber,et al. Variational methods for Reinforcement Learning , 2010, AISTATS.