Scalable Internal-State Policy-Gradient Methods for POMDPs
暂无分享,去创建一个
[1] WU KarenT,et al. Results , 1969 .
[2] J. Douglas Faires,et al. Numerical Analysis , 1981 .
[3] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[4] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[5] M. Kaiser,et al. Time-delay neural networks for control , 1994 .
[6] Yoshua Bengio,et al. Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.
[7] G. Casella,et al. Rao-Blackwellisation of sampling schemes , 1996 .
[8] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[9] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[10] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[11] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[12] Terrence L. Fine,et al. Feedforward Neural Network Methodology , 1999, Information Science and Statistics.
[13] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.