Dynamic Programming and Optimal Control, Two Volume Set
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Z. Rekasius,et al. Suboptimal design of intentionally nonlinear controllers , 1964 .
[3] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[4] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[5] B. Martinet,et al. R'egularisation d''in'equations variationnelles par approximations successives , 1970 .
[6] D. Blackwell,et al. The Optimal Reward Operator in Dynamic Programming , 1974 .
[7] Charlotte Striebel,et al. Optimal Control of Discrete Time Stochastic Systems , 1975 .
[8] D. Bertsekas. Monotone mappings in dynamic programming , 1975, 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes.
[9] Uriel G. Rothblum,et al. Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..
[10] George N. Saridis,et al. An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[11] P. Whittle. Stability and characterisation conditions in negative programming , 1980, Journal of Applied Probability.
[12] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[13] Uriel G. Rothblum,et al. Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..
[14] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[15] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[16] Audra E. Kosh,et al. Linear Algebra and its Applications , 1992 .
[17] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[18] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[19] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[20] Michael C. Fu,et al. Smoothed perturbation analysis derivative estimation for Markov chains , 1994, Oper. Res. Lett..
[21] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[22] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[23] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[24] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[25] Eugene A. Feinberg,et al. Markov Decision Models with Weighted Discounted Criteria , 1994, Math. Oper. Res..
[26] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[27] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[28] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[29] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[30] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[31] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[32] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[33] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[34] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[35] Wenju Liu,et al. A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..
[36] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[37] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[38] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[39] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .