A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
暂无分享,去创建一个
[1] L. Györfi,et al. On the Averaged Stochastic Approximation for Linear Regression , 1996 .
[2] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[3] Shie Mannor,et al. Finite Sample Analysis for TD(0) with Linear Function Approximation , 2017, ArXiv.
[4] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[5] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[6] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[7] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[8] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[9] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[10] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[11] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[12] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[13] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[14] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[15] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[18] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[19] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[20] Vivek F. Farias,et al. Pathwise Optimization for Optimal Stopping Problems , 2012, Manag. Sci..
[21] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[22] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[23] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[24] Rémi Munos,et al. Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control , 2013, ECML/PKDD.
[25] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[26] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[27] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[28] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[29] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[30] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[31] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[32] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[33] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[34] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[35] Martin B. Haugh,et al. Pricing American Options: A Duality Approach , 2001, Oper. Res..
[36] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[37] David A. Goldberg,et al. Beating the curse of dimensionality in options pricing and optimal stopping , 2018, ArXiv.
[38] Csaba Szepesvari,et al. Finite Time Bounds for Temporal Difference Learning with Function Approximation: Problems with some “state-of-the-art” results , 2017 .
[39] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[40] V. Climenhaga. Markov chains and mixing times , 2013 .
[41] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[42] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[43] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[44] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[45] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[46] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[47] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[48] H. Kushner. Stochastic approximation: a survey , 2010 .
[49] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[50] Mark Broadie,et al. A Primal-Dual Simulation Algorithm for Pricing Multi-Dimensional American Options , 2001 .
[51] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..