Probabilistic inference for determining options in reinforcement learning
暂无分享,去创建一个
Jan Peters | Herke van Hoof | Christian Daniel | Gerhard Neumann | Jan Peters | G. Neumann | Christian Daniel | H. V. Hoof
[1] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .
[2] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[3] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[6] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[7] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[9] Tobias Scheffer,et al. International Conference on Machine Learning (ICML-99) , 1999, Künstliche Intell..
[10] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[11] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[12] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..
[13] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[14] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[15] Kazuhito Yokoi,et al. Biped walking pattern generation by using preview control of zero-moment point , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).
[16] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.
[17] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[18] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[19] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[20] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[21] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .
[22] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[25] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[26] Thomas G. Dietterich,et al. Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.
[27] Michael I. Jordan,et al. Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.
[28] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[29] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[30] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[31] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[32] Scott Niekum,et al. Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery , 2011, Lifelong Learning.
[33] Nahum Shimkin,et al. Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.
[34] Stefan Schaal,et al. Hierarchical reinforcement learning with movement primitives , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.
[35] Leslie Pack Kaelbling,et al. Bayesian Policy Search with Policy Priors , 2011, IJCAI.
[36] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[37] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[38] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[39] Scott Niekum,et al. Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[40] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.
[41] Jan Peters,et al. Probabilistic Movement Primitives , 2013, NIPS.
[42] Oliver Kroemer,et al. Learning sequential motor tasks , 2013, 2013 IEEE International Conference on Robotics and Automation.
[43] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .
[44] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[45] Jan Peters,et al. Learning of Non-Parametric Control Policies with High-Dimensional State Features , 2015, AISTATS.
[46] Pravesh Ranchod,et al. Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).