Q-learning for history-based reinforcement learning
暂无分享,去创建一个
[1] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.
[2] Stephan Timmer,et al. Safe Q-Learning on Complete History Spaces , 2007, ECML.
[3] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[4] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .
[5] Marcus Hutter,et al. Consistency of Feature Markov Processes , 2010, ALT.
[6] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[7] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[8] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[11] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[12] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[13] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[14] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[15] Joel Veness,et al. A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..
[16] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[17] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.
[18] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[19] Marcus Hutter,et al. Feature Reinforcement Learning in Practice , 2011, EWRL.
[20] H. Robbins. A Stochastic Approximation Method , 1951 .
[21] Benjamin Van Roy,et al. Universal Reinforcement Learning , 2007, IEEE Transactions on Information Theory.
[22] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[23] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[24] Marcus Hutter,et al. Feature Reinforcement Learning: Part I. Unstructured MDPs , 2009, J. Artif. Gen. Intell..
[25] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.
[26] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[27] Marcus Hutter,et al. Context Tree Maximizing , 2012, AAAI.
[28] Marcus Hutter,et al. Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.
[29] Marcus Hutter,et al. Feature Reinforcement Learning using Looping Suffix Trees , 2012, EWRL.
[30] Andrew McCallum,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .