Finding Options that Minimize Planning Time
暂无分享,去创建一个
Michael L. Littman | George Konidaris | David Abel | Yuu Jinnai | M. Littman | G. Konidaris | David Abel | Yuu Jinnai
[1] Vasek Chvátal,et al. A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..
[2] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[5] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[6] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.
[7] Ran Raz,et al. A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.
[8] Pierre-Luc Bacon. On the Bottleneck Concept for Options Discovery: Theoretical Underpinnings and Extension in Continuous State Spaces , 2013 .
[9] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[10] Alireza Khadivi,et al. Automatic skill acquisition in reinforcement learning using graph centrality measures , 2012, Intell. Data Anal..
[11] Ran Raz,et al. Label Cover Instances with Large Girth and the Hardness of Approximating Basic k-Spanner , 2012, ICALP.
[12] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.
[13] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[14] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[15] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[16] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[17] George Konidaris,et al. Constructing Abstraction Hierarchies Using a Skill-Symbol Loop , 2015, IJCAI.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[20] Bram Bakker,et al. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .
[21] Alec Solway,et al. Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..
[22] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[23] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.
[24] Rina Panigrahy,et al. An O(log*n) approximation algorithm for the asymmetric p-center problem , 1996, SODA '96.
[25] Guy Kortsarz. On the Hardness of Approximating Spanners , 2001, Algorithmica.
[26] Glenn A. Iba,et al. A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.
[27] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[28] Kyomin Jung,et al. Transitive-Closure Spanners , 2008, SIAM J. Comput..
[29] Romain Laroche,et al. On Value Function Representation of Long Horizon Problems , 2018, AAAI.
[30] Ashwin Ram,et al. The Utility Problem in Case-Based Reasoning , 1993 .
[31] Michael L. Littman,et al. Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.
[32] David Steurer,et al. Analytical approach to parallel repetition , 2013, STOC.
[33] Aaron Archer. Two O (log* k)-Approximation Algorithms for the Asymmetric k-Center Problem , 2001, IPCO.
[34] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[35] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[36] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[37] Sudipto Guha,et al. Asymmetric k-center is log* n-hard to approximate , 2005, JACM.
[38] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[39] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[40] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[41] Dorit S. Hochbaum,et al. Approximation Algorithms for the Set Covering and Vertex Cover Problems , 1982, SIAM J. Comput..
[42] H. Simon,et al. Models Of Man : Social And Rational , 1957 .
[43] Irit Dinur,et al. On the hardness of approximating label-cover , 2004, Inf. Process. Lett..
[44] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[45] Michael L. Littman,et al. The Complexity of Plan Existence and Evaluation in Probabilistic Domains , 1997, UAI.
[46] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.