The Online Loop-free Stochastic Shortest-Path Problem
暂无分享,去创建一个
[1] Shie Mannor,et al. Arbitrarily modulated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[2] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[3] Shie Mannor,et al. Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.
[4] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.
[5] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[6] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.
[7] Tamás Linder,et al. Tracking the Best Quantizer , 2005, IEEE Transactions on Information Theory.
[8] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[9] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[10] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.
[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[14] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[15] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[16] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[17] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..
[18] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[19] Neri Merhav,et al. Low-complexity sequential lossless coding for piecewise-stationary memoryless sources , 1998, IEEE Trans. Inf. Theory.
[20] Vladimir Vovk,et al. Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.
[21] Frans M. J. Willems,et al. Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.
[22] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[23] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[24] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[25] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[26] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.