暂无分享,去创建一个
[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[2] Dirk P. Kroese,et al. Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics) , 1981 .
[3] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[4] C. Watkins. Learning from delayed rewards , 1989 .
[5] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[10] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[13] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[14] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[15] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[16] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[17] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[18] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[19] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[20] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[21] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[22] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[23] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[24] Satinder P. Singh,et al. Linear options , 2010, AAMAS.
[25] Doina Precup,et al. Optimal policy switching algorithms for reinforcement learning , 2010, AAMAS.
[26] Scott Niekum,et al. Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery , 2011, Lifelong Learning.
[27] Nahum Shimkin,et al. Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.
[28] Scott Kuindersma,et al. Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.
[29] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[30] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[31] Yee Whye Teh,et al. Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.
[32] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[33] Scott Niekum,et al. Semantically Grounded Learning from Unstructured Demonstrations , 2013 .
[34] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[35] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[36] Shie Mannor,et al. Time-Regularized Interrupting Options (TRIO) , 2014, ICML.
[37] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[38] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[39] Balaraman Ravindran,et al. Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks , 2016, ArXiv.
[40] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.
[41] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[42] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[43] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[44] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.