Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] G. W. Snedecor. Statistical Methods , 1964 .
[3] D. Naidu,et al. Optimal Control Systems , 2018 .
[4] J. J. Martin. Bayesian Decision Problems and Markov Chains , 1967 .
[5] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[6] Y. Bar-Shalom. Stochastic dynamic programming: Caution and probing , 1981 .
[7] P. Kumar,et al. Optimal adaptive controllers for unknown Markov chains , 1982 .
[8] P. Kumar,et al. A new family of optimal adaptive controllers for Markov chains , 1982 .
[9] Mitsuo Sato,et al. Learning control of finite Markov chains with unknown transition probabilities , 1982 .
[10] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[11] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[12] Mitsuo Sato,et al. An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities , 1985 .
[13] R. Larsen,et al. An introduction to mathematical statistics and its applications (2nd edition) , by R. J. Larsen and M. L. Marx. Pp 630. £17·95. 1987. ISBN 13-487166-9 (Prentice-Hall) , 1987, The Mathematical Gazette.
[14] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..
[15] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.
[16] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[17] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[18] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[19] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[20] Sebastian Thrun,et al. On Planning And Exploration In Non-Discrete Environments , 1991 .
[21] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[22] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[23] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[24] R. Sutton. Introduction: The Challenge of Reinforcement Learning , 1992 .
[25] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[26] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[27] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[28] R. Tibshirani,et al. An introduction to the bootstrap , 1993 .
[29] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[30] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[31] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[32] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[33] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[34] Nicolas Meuleau. Le dilemme entre exploration et exploitation dans l'apprentissage par renforcement : optimisation adaptative des modeles de decision multi-etats , 1996 .
[35] Leslie Pack Kaelbling,et al. The NSF Workshop on Reinforcement Learning: Summary and Observations , 1996 .
[36] Leslie Pack Kaelbling,et al. The National Science Foundation Workshop on Reinforcement Learning , 1996, AI Mag..
[37] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[38] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[39] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[40] R. Simmons,et al. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.
[41] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[42] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.