Planning and Learning with Stochastic Action Sets
暂无分享,去创建一个
Craig Boutilier | Avinatan Hassidim | Yishay Mansour | Amit Daniely | Ofer Meshi | Dale Schuurmans | Alon Cohen | Martin Mladenov | Y. Mansour | Craig Boutilier | D. Schuurmans | Ofer Meshi | Martin Mladenov | Avinatan Hassidim | Alon Cohen | Amit Daniely
[1] U. Meister,et al. A polynomial time bound for Howard's policy improvement algorithm , 1986 .
[2] J. G. Pierce,et al. Geometric Algorithms and Combinatorial Optimization , 2016 .
[3] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .
[4] Mihalis Yannakakis,et al. Shortest Paths Without a Map , 1989, Theor. Comput. Sci..
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] John N. Tsitsiklis,et al. Stochastic shortest path problems with recourse , 1996, Networks.
[7] Ravi Kumar,et al. On targeting Markov segments , 1999, STOC '99.
[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[9] David R. Karger,et al. Route Planning under Uncertainty: The Canadian Traveller Problem , 2008, AAAI.
[10] Varun Kanade,et al. Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards , 2009, AISTATS.
[11] Zheng Chen,et al. A Markov chain model for integrating behavioral targeting into contextual advertising , 2009, KDD Workshop on Data Mining and Audience Intelligence for Advertising.
[12] Vahab S. Mirrokni,et al. Mining advertiser-specific user behavior using adfactors , 2010, WWW '10.
[13] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.
[14] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[15] Anton Schwaighofer,et al. Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.
[16] Vahab S. Mirrokni,et al. Budget Optimization for Online Campaigns with Positive Carryover Effects , 2012, WINE.
[17] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[18] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[19] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.
[20] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[22] Craig Boutilier,et al. Logistic Markov Decision Processes , 2017, IJCAI.