暂无分享,去创建一个
[1] Byron Boots,et al. Online Learning with Continuous Variations: Dynamic Regret and Reductions , 2020, AISTATS.
[2] Byron Boots,et al. Accelerating Imitation Learning with Predictive Models , 2018, AISTATS.
[3] Byron Boots,et al. Predictor-Corrector Policy Optimization , 2018, ICML.
[4] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[5] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[6] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[7] W. Oettli,et al. From optimization and variational inequalities to equilibrium problems , 1994 .
[8] Mengdi Wang,et al. An online primal-dual method for discounted Markov decision processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[9] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Qihang Lin,et al. Revisiting Approximate Linear Programming Using a Saddle Point Based Reformulation and Root Finding Solution Approach , 2017 .
[12] Niao He,et al. Stochastic Primal-Dual Q-Learning , 2018, 1810.08298.
[13] E. Denardo,et al. Multichain Markov Renewal Programs , 1968 .
[14] O. Hernández-Lerma,et al. Discrete-time Markov control processes , 1999 .
[15] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[16] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[17] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.
[18] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[19] Monica Bianchi,et al. Generalized monotone bifunctions and equilibrium problems , 1996 .
[20] Le Song,et al. Boosting the Actor with Dual Critic , 2017, ICLR.
[21] M. Habib. Probabilistic methods for algorithmic discrete mathematics , 1998 .
[22] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[23] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[24] Alan S. Manne,et al. Linear Programming and Sequential Decision Models , 1959 .
[25] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.
[26] W. Fleming. Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .
[27] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[28] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[29] C. McDiarmid. Concentration , 1862, The Dental register.
[30] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[31] Shalabh Bhatnagar,et al. A Linearly Relaxed Approximate Linear Program for Markov Decision Processes , 2017, IEEE Transactions on Automatic Control.
[32] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[33] Roger J.-B. Wets,et al. Variational Convergence of Bifunctions: Motivating Applications , 2014, SIAM J. Optim..
[34] Geoffrey J. Gordon. Regret bounds for prediction problems , 1999, COLT '99.
[35] Peter L. Bartlett,et al. Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.
[36] Byron Boots,et al. Convergence of Value Aggregation for Imitation Learning , 2018, AISTATS.
[37] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[38] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[39] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.