暂无分享,去创建一个
Dilan Görür | Nevena Lazic | Nir Levine | Mehrdad Farajtabar | Dale Schuurmans | Dong Yin | Chris Harris | D. Schuurmans | Mehrdad Farajtabar | Chris Harris | Dilan Görür | Dong Yin | Nir Levine | N. Lazic
[1] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[2] Bo Dai,et al. Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.
[3] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[4] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[5] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[8] Jie Chen,et al. Stochastic Gradient Descent with Biased but Consistent Gradient Estimators , 2018, ArXiv.
[9] Arthur Charpentier,et al. the Dirichlet distribution , 2012 .
[10] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[11] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[12] W. Kahan,et al. The Rotation of Eigenvectors by a Perturbation. III , 1970 .
[13] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[14] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[15] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[16] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[17] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[18] Xi Chen,et al. Large-Scale Markov Decision Problems via the Linear Programming Dual , 2019, ArXiv.
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Tommi S. Jaakkola,et al. Maximum Entropy Discrimination , 1999, NIPS.
[21] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[22] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[23] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[24] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[25] Chandler Davis. The rotation of eigenvectors by a perturbation , 1963 .
[26] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[27] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[28] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[29] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[30] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[31] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[32] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[33] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[34] Tengyao Wang,et al. A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.
[35] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[36] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[37] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[38] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[39] Avinatan Hassidim,et al. Online Linear Quadratic Control , 2018, ICML.
[40] Huan Xu,et al. Large Scale Markov Decision Processes with Changing Rewards , 2019, NeurIPS.
[41] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[42] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[43] Bo Dai,et al. Batch Stationary Distribution Estimation , 2020, ICML.
[44] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[45] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.