Fast Bellman Updates for Robust MDPs
暂无分享,去创建一个
[1] David L. Donoho,et al. Solution of l1Minimization Problems by LARS/Homotopy Methods , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[2] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[3] John N. Tsitsiklis,et al. Introduction to linear optimization , 1997, Athena scientific optimization and computation series.
[4] Paul H. Zipkin,et al. Foundations of Inventory Management , 2000 .
[5] J. Tsitsiklis,et al. Robust, risk-sensitive, and data-driven control of markov decision processes , 2007 .
[6] Laurent El Ghaoui,et al. An Homotopy Algorithm for the Lasso with Online Observations , 2008, NIPS.
[7] Marek Petrik,et al. RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning , 2014, NIPS.
[8] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[9] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[10] Yoram Singer,et al. Efficient projections onto the {\it l}$_{\mbox{1}}$-ball for learning in high dimensions , 2008, ICML 2008.
[11] Shie Mannor,et al. Parametric regret in uncertain Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[12] Alexandre M. Bayen,et al. Projected sub-gradient with ℓ1 or simplex constraints via isotonic regression , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[13] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[14] Shie Mannor,et al. Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.
[15] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..
[16] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[17] Marek Petrik,et al. Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds , 2012, ICML.
[18] Robert J. Vanderbei,et al. Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.
[19] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[20] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[21] Thomas G. Dietterich,et al. PAC optimal MDP planning with application to invasive species management , 2015, J. Mach. Learn. Res..
[22] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[23] Justin K. Romberg,et al. Dantzig selector homotopy with dynamic measurements , 2009, Electronic Imaging.
[24] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[25] Andrew J. Schaefer,et al. Robust Modified Policy Iteration , 2013, INFORMS J. Comput..
[26] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[27] Shie Mannor,et al. The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.
[28] Scott Sanner,et al. Real-time dynamic programming for Markov decision processes with imprecise probabilities , 2016, Artif. Intell..
[29] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .