Discovering temporally extended features for reinforcement learning in domains with delayed causalities
暂无分享,去创建一个
[1] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.
[2] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[3] Jorge Nocedal,et al. Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..
[4] Alborz Geramifard,et al. Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery , 2012, ECML/PKDD.
[5] Marcus Hutter,et al. Context tree maximizing reinforcement learning , 2012, AAAI 2012.
[6] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.
[7] Andrew McCallum,et al. Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.
[8] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[9] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[10] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[11] Marcus Hutter,et al. Q-learning for history-based reinforcement learning , 2013, ACML.
[12] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[13] Joel Veness,et al. Reinforcement Learning via AIXI Approximation , 2010, AAAI.
[14] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[15] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.