暂无分享,去创建一个
[1] J. Kiefer,et al. Sequential minimax search for a maximum , 1953 .
[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[3] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[4] W. Rudin. Principles of mathematical analysis , 1964 .
[5] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .
[6] G. Hewer. An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .
[7] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..
[8] A. Peressini,et al. The Mathematics Of Nonlinear Programming , 1988 .
[9] P. Glynn,et al. Stochastic Optimization by Simulation: Convergence Proofs for the GI/G/1 Queue in Steady-State , 1994 .
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] K. Loparo,et al. Inequalities for the trace of matrix product , 1994, IEEE Trans. Autom. Control..
[12] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[13] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[14] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[15] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[19] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[20] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[23] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[24] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[25] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[26] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[27] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[28] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[29] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[30] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[31] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[32] Itir Z. Karaesmen,et al. Overbooking with Substitutable Inventory Classes , 2004, Oper. Res..
[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[34] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[35] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[36] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[37] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[38] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[39] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[40] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[41] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[42] Huseyin Topaloglu,et al. Using Stochastic Approximation Methods to Compute Optimal Base-Stock Levels in Inventory Control Problems , 2008, Oper. Res..
[43] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[44] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[45] Garrett J. van Ryzin,et al. Simulation-Based Optimization of Virtual Nesting Controls for Network Revenue Management , 2008, Oper. Res..
[46] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[47] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[48] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[49] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[50] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[51] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[52] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .
[53] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[54] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[55] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[56] Woonghee Tim Huh,et al. Online Sequential Optimization with Biased Gradients: Theory and Applications to Censored Demand , 2014, INFORMS J. Comput..
[57] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[58] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[59] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[60] Sayan Mukherjee,et al. The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.
[61] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[62] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[63] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[64] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[65] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Math. Program..
[66] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[67] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[68] Alexander J. Smola,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[69] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.
[70] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[71] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[72] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[73] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[74] J Reddi Sashank,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016 .
[75] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[76] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[77] Matthieu Geist,et al. Is the Bellman residual a bad proxy? , 2016, NIPS.
[78] Dale Schuurmans,et al. Improving Policy Gradient by Exploring Under-appreciated Rewards , 2016, ICLR.
[79] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[80] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[81] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[82] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[83] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[84] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[85] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.
[86] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.
[87] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[88] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[89] Damek Davis,et al. Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems , 2017, SIAM J. Optim..
[90] Dimitri P. Bertsekas,et al. Feature-based aggregation and deep reinforcement learning: a survey and some new implementations , 2018, IEEE/CAA Journal of Automatica Sinica.
[91] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[92] Dmitriy Drusvyatskiy,et al. Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.
[93] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[94] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .