Configurable Markov Decision Processes
暂无分享,去创建一个
[1] Robert L. Smith,et al. A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes , 2013, Oper. Res..
[2] Chelsea C. White,et al. Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..
[3] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[4] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[5] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[6] F. Cozman,et al. Representing and solving factored markov decision processes with imprecise probabilities , 2009 .
[7] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[8] Elena Deza,et al. Encyclopedia of Distances , 2014 .
[9] L. V. D. Heyden,et al. Perturbation bounds for the stationary probabilities of a finite Markov chain , 1984 .
[10] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[11] John M Gozzolino,et al. MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES , 1965 .
[12] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[13] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[14] Takayuki Osogami,et al. Robust partially observable Markov decision process , 2015, ICML.
[15] Marcello Restelli,et al. Adaptive Batch Size for Safe Policy Gradients , 2017, NIPS.
[16] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[17] Shimon Whiteson,et al. OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.
[18] Stephen J. Wright,et al. A Fast and Reliable Policy Improvement Algorithm , 2016, AISTATS.
[19] Denis Deratani Mauá,et al. Modeling Markov Decision Processes with Imprecise Probabilities Using Probabilistic Logic Programming , 2017, ISIPTA.
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Bruce Lee Bowerman,et al. Nonstationary Markov decision processes and related topics in nonstationary Markov chains , 1974 .
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[26] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.
[27] Thomas L. Griffiths,et al. Faster Teaching by POMDP Planning , 2011, AIED.
[28] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[29] Zhi-Qiang Liu,et al. Bounded-Parameter Partially Observable Markov Decision Processes , 2008, ICAPS.
[30] Robert L. Smith,et al. Solution and Forecast Horizons for Infinite-Horizon Nonhomogeneous Markov Decision Processes , 2007, Math. Oper. Res..
[31] Robert L. Smith,et al. Solving Nonstationary Infinite Horizon Dynamic Optimization Problems , 2000 .
[32] Robert L. Smith,et al. A New Optimality Criterion for Nonhomogeneous Markov Decision Processes , 1987, Oper. Res..