Online Learning in Markov Decision Processes with Changing Cost Sequences
暂无分享,去创建一个
[1] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[2] Yinyu Ye,et al. A Quadratically Convergent Polynomial Algorithm for Solving Entropy Optimization Problems , 1993, SIAM J. Optim..
[3] Dick den Hertog,et al. Interior Point Approach to Linear, Quadratic and Convex Programming: Algorithms and Complexity , 1994 .
[4] Vivek S. Borkar,et al. Convex Analytic Methods in Markov Decision Processes , 2002 .
[5] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[6] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[7] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[8] A. Nemirovski. Advances in convex optimization : conic programming , 2005 .
[9] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[10] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[11] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[12] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[13] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[14] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[15] Csaba Szepesvari,et al. The Online Loop-free Stochastic Shortest-Path Problem , 2010, Annual Conference Computational Learning Theory.
[16] Hariharan Narayanan,et al. Random Walk Approach to Regret Minimization , 2010, NIPS.
[17] Yichuan Zhang,et al. Advances in Neural Information Processing Systems 25 , 2012 .
[18] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
[19] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[20] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[21] Gergely Neu,et al. An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.
[22] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[23] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[24] Wang Feng,et al. Online Learning Algorithms for Big Data Analytics: A Survey , 2015 .
[25] Christin Wirth,et al. Entropy Optimization And Mathematical Programming , 2016 .