Bayes-adaptive hierarchical MDPs

[1]  Ngo Anh Vien,et al.  Touch based POMDP manipulation via sequential submodular optimization , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[2]  Marc Toussaint,et al.  POMDP manipulation via trajectory optimization , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Marc Toussaint,et al.  Hierarchical Monte-Carlo Planning , 2015, AAAI.

[4]  Nina Dethlefs,et al.  Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots , 2014, TIIS.

[5]  Ngo Anh Vien,et al.  Approximate planning for bayesian hierarchical reinforcement learning , 2014, Applied Intelligence.

[6]  Marc Toussaint,et al.  Model-Based Relational RL When Object Existence is Partially Observable , 2014, ICML.

[7]  Wolfgang Ertel,et al.  Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.

[8]  Nicholas Roy,et al.  Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[9]  TaeChoong Chung,et al.  Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.

[10]  Wolfgang Ertel,et al.  Monte-Carlo tree search for Bayesian reinforcement learning , 2012, Applied Intelligence.

[11]  Feng Cao,et al.  Bayesian Hierarchical Reinforcement Learning , 2012, NIPS.

[12]  David Hsu,et al.  Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.

[13]  Peter Dayan,et al.  Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.

[14]  David Hsu,et al.  Monte Carlo Value Iteration with Macro-Actions , 2011, NIPS.

[15]  Michael L. Littman,et al.  Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.

[16]  TaeChoong Chung,et al.  Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..

[17]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[18]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[19]  Nicholas Roy,et al.  PUMA: Planning Under Uncertainty with Macro-Actions , 2010, AAAI.

[20]  TaeChoong Chung,et al.  Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors , 2010, IEICE Trans. Inf. Syst..

[21]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[22]  Paloma Martínez,et al.  Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.

[23]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[24]  TaeChoong Chung,et al.  Policy Gradient Semi-markov Decision Process , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[25]  Joelle Pineau,et al.  Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.

[26]  Nguyen Hoang Viet,et al.  Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.

[27]  Nguyen Hoang Viet,et al.  Obstacle Avoidance Path Planning for Mobile Robot Based on Multi Colony Ant Algorithm , 2008, First International Conference on Advances in Computer-Human Interaction.

[28]  Joelle Pineau,et al.  Bayes-Adaptive POMDPs , 2007, NIPS.

[29]  TaeChoong Chung,et al.  Natural Gradient Policy for Average Cost SMDP Problem , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[30]  Caro Lucas,et al.  A Dynamic Fuzzy-Based Crossover Method for Genetic Algorithms , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[31]  Nguyen Hoang Viet,et al.  Heuristic Search Based Exploration in Reinforcement Learning , 2007, IWANN.

[32]  Mohammad Ghavamzadeh,et al.  Bayesian actor-critic algorithms , 2007, ICML '07.

[33]  Nguyen Hoang Viet,et al.  Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm , 2007, ISNN.

[34]  Doina Precup,et al.  Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.

[35]  Mohammad Ghavamzadeh,et al.  Bayesian Policy Gradient Algorithms , 2006, NIPS.

[36]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[37]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[38]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[39]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[40]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[41]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[42]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[43]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[44]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[45]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[46]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[47]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[48]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[49]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[50]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[51]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[52]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[53]  Satinder Singh,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[54]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[55]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[56]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[57]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[58]  Chelsea C. White,et al.  Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..

[59]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[60]  Stephen W. Carden,et al.  An Introduction to Reinforcement Learning , 2013 .

[61]  David Hsu,et al.  Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.

[62]  Nguyen Hoang Viet,et al.  Q-Learning based Univector Field Navigation Method for Mobile Robots , 2007 .

[63]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64]  Vittaldas V. Prabhu,et al.  Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems , 2004, Applied Intelligence.

[65]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[66]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[67]  S. Thrun,et al.  An integrated approach to hierarchy and abstraction for POMDPs , 2002 .