Markov Decision Processes with Continuous Side Information
暂无分享,去创建一个
Nan Jiang | Ambuj Tewari | Satinder P. Singh | Aditya Modi | Satinder Singh | Ambuj Tewari | Nan Jiang | Aditya Modi
[1] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[2] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[3] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[4] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[5] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[6] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[7] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[8] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[9] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[10] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[11] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[12] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[13] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[14] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[15] Csaba Szepesvári,et al. Agnostic KWIK learning and efficient approximate reinforcement learning , 2011, COLT.
[16] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[17] Alessandro Lazaric,et al. Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.
[18] M. M. Hassan Mahmud,et al. Clustering Markov Decision Processes For Continual Transfer , 2013, ArXiv.
[19] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.
[20] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[21] Eric Eaton,et al. Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.
[22] Yasin Abbasi-Yadkori,et al. Online learning in MDPs with side information , 2014, ArXiv.
[23] Shie Mannor,et al. Contextual Markov Decision Processes , 2015, ArXiv.
[24] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[25] George Dimitri Konidaris,et al. Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes , 2016, ArXiv.