暂无分享,去创建一个
Ofir Nachum | Aldo Pacchiano | Jonathan Lee | Peter Bartlett | P. Bartlett | Ofir Nachum | Aldo Pacchiano | Jonathan Lee
[1] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.
[2] Gergely Neu,et al. Logistic $Q$-Learning , 2020, AISTATS.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[5] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[6] Jon D. McAuliffe,et al. Uniform, nonparametric, non-asymptotic confidence sequences , 2018 .
[7] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[8] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time , 2017, 1704.01869.
[9] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[10] Ilya Kostrikov,et al. Imitation Learning via Off-Policy Distribution Matching , 2019, ICLR.
[11] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[12] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[14] Byron Boots,et al. A Reduction from Reinforcement Learning to No-Regret Online Learning , 2020, AISTATS.
[15] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[16] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[17] Aaron Sidford,et al. Efficiently Solving MDPs with Stochastic Mirror Descent , 2020, ICML.
[18] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[19] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[20] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.
[21] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[22] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.
[23] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[24] Jan Peters,et al. f-Divergence constrained policy improvement , 2017, ArXiv.
[25] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.
[26] Gergely Neu,et al. Faster saddle-point optimization for solving large-scale Markov decision processes , 2020, L4DC.
[27] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[28] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[29] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..