暂无分享,去创建一个
[1] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[2] Chelsea C. White,et al. Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.
[5] Laurent El Ghaoui,et al. Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.
[6] John N. Tsitsiklis,et al. Bias and variance in value function estimation , 2004, ICML.
[7] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[8] Shie Mannor,et al. Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..
[9] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[10] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.
[11] Shie Mannor,et al. Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.
[12] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[13] Andrew J. Schaefer,et al. Robust Modified Policy Iteration , 2013, INFORMS J. Comput..
[14] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..
[15] N. Bambos,et al. Infinite time horizon maximum causal entropy inverse reinforcement learning , 2014, 53rd IEEE Conference on Decision and Control.
[16] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[18] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[19] Felix Brandt,et al. An Ordinal Minimax Theorem , 2014, Games Econ. Behav..
[20] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[21] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[22] Shie Mannor,et al. Reinforcement Learning in Robust Markov Decision Processes , 2013, Math. Oper. Res..
[23] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[24] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[25] Tony Jebara,et al. Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.
[26] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[27] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[28] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[29] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[30] Marek Petrik,et al. Fast Bellman Updates for Robust MDPs , 2018, ICML.
[31] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[32] Kyungjae Lee,et al. Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.
[33] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[34] Lantao Yu,et al. Multi-Agent Adversarial Inverse Reinforcement Learning , 2019, ICML.
[35] Sergey Levine,et al. If MaxEnt RL is the Answer, What is the Question? , 2019, ArXiv.
[36] V. Cevher,et al. Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch , 2020, NeurIPS.
[37] Martin A. Riedmiller,et al. Robust Reinforcement Learning for Continuous Control with Model Misspecification , 2019, ICLR.