Abstract Dynamic Programming
暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] L. G. Mitten. Composition Principles for Synthesis of Optimal Multistage Processes , 1964 .
[3] D. Blackwell. Discounted Dynamic Programming , 1965 .
[4] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .
[5] L. G. Mitten,et al. ELEMENTS OF SEQUENTIAL DECISION PROCESSES , 1966 .
[6] A. F. Veinott. ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .
[7] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[8] David Blackwell,et al. Positive dynamic programming , 1967 .
[9] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .
[10] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[11] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.
[12] D. Bertsekas. Infinite time reachability of state-space regions by using feedback control , 1972 .
[13] Rhodes,et al. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games , 1973 .
[14] Manfred SchÄl,et al. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .
[15] D. Bertsekas. Monotone mappings in dynamic programming , 1975, 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes.
[16] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .
[17] Stanley R. Pliska. ON THE TRANSIENT CASE FOR MARKOV DECISION CHAINS WITH GENERAL STATE SPACES , 1978 .
[18] Gérard M. Baudet,et al. Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.
[19] Uriel G. Rothblum,et al. Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..
[20] P. Whittle. Stability and characterisation conditions in negative programming , 1980, Journal of Applied Probability.
[21] T. Morin. Monotonicity and the principle of optimality , 1982 .
[22] Stef Tijs,et al. Fictitious play applied to sequences of games and discounted stochastic games , 1982 .
[23] M. I. Henig. Vector-Valued Dynamic Programming , 1983 .
[24] Paul J. Schweitzer,et al. Aggregation Methods for Large Markov Chains , 1983, Computer Performance and Reliability.
[25] Uriel G. Rothblum,et al. Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..
[26] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.
[27] Rolf van Dawen,et al. Negative Dynamic Programming , 1984 .
[28] P. C. Bhakta,et al. Some existence theorems for functional equations arising in dynamic programming, II , 1984 .
[29] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[30] Karl-Heinz Waldmann,et al. On Bounds for Dynamic Programs , 1985, Math. Oper. Res..
[31] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[32] S. Verdú,et al. Abstract dynamic programming models under commutativity conditions , 1987 .
[33] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[34] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .
[35] R. Carraway,et al. Theory and applications of generalized dynamic programming: An overview , 1988 .
[36] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[37] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[38] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[39] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[40] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..
[41] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[42] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[43] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[44] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[45] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[46] S.,et al. Risk-Sensitive Control and Dynamic Games for Partially Observed Discrete-Time Nonlinear Systems , 1994 .
[47] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[48] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[49] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[50] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[51] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[52] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[53] W. Fleming,et al. Risk-Sensitive Control on an Infinite Time Horizon , 1995 .
[54] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[55] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[56] S. Marcus,et al. Risk sensitive control of Markov processes in countable state space , 1996 .
[57] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[58] András Lörincz,et al. Inverse Dynamics Controllers for Robust Control: Consequences for Neurocontrollers , 1996, ICANN.
[59] epetivari Ctiaba Sz. Some basic facts concerning minimax sequential decision processes , 1996 .
[60] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[61] Leon A. Petrosyan,et al. Game Theory (Second Edition) , 1996 .
[62] Matthias Heger. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.
[63] Csaba Szepesvári,et al. Learning and Exploitation Do Not Conflict Under Minimax Optimality , 1997, ECML.
[64] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[65] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[66] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[67] András Lörincz,et al. Neurocontroller using dynamic state feedback for compensatory control , 1997, Neural Networks.
[68] E. Fernández-Gaucherand,et al. Risk-sensitive optimal control of hidden Markov models: structural results , 1997, IEEE Trans. Autom. Control..
[69] Csaba Szepesvari,et al. Module Based Reinforcement Learning for a Real Robot , 1997 .
[70] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[71] Csaba Szepesvari. Static and Dynamic Aspects of Optimal Sequential Decision Making , 1998 .
[72] Csaba Szepesvári. Non-Markovian Policies in Sequential Decision Problems , 1998, Acta Cybern..
[73] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..
[74] S. P. Meynz,et al. Risk Sensitive Optimal Control: Existence and Synthesis for Models with Unbounded Cost , 1999 .
[75] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[76] D. Bertsekas,et al. Stochastic Shortest Path Games , 1999 .
[77] O. Hernández-Lerma,et al. Further topics on discrete-time Markov control processes , 1999 .
[78] O. Hernández-Lerma,et al. Markov Control Processes with the Expected Total Cost Criterion: Optimality, Stability, and Transient Models , 1999 .
[79] Thomas de Quincey. [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.
[80] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[81] Eric Rogers,et al. Uncertainty, performance, and model dependency in approximate adaptive nonlinear control , 2000, IEEE Trans. Autom. Control..
[82] Stephen D. Patek,et al. On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..
[83] Ying He,et al. Simulation-Based Algorithms for Markov Decision Processes , 2002 .
[84] Sean P. Meyn,et al. Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost , 2002, Math. Oper. Res..
[85] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[86] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[87] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[88] Karl-Heinz Waldmann,et al. Algorithms for Countable State Markov Decision Models with an Absorbing Set , 2005, SIAM J. Control. Optim..
[89] E. J. Collins,et al. An analysis of transient Markov decision processes , 2006, Journal of Applied Probability.
[90] Stephen D. Patek,et al. Partially Observed Stochastic Shortest Path Problems With Approximate Solution by Neurodynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.
[91] H. Robbins. A Stochastic Approximation Method , 1951 .
[92] D. Bertsekas,et al. Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods , 2007 .
[93] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris , 2007 .
[94] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[95] Vivek S. Borkar,et al. A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..
[96] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[97] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[98] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.
[99] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[100] Abraham Thomas,et al. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 2009 .
[101] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[102] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[103] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[104] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[105] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[106] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[107] Dimitri P. Bertsekasy. Williams-Baird Counterexample for Q-Factor Asynchronous Policy Iteration , 2010 .
[108] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[109] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[110] Dimitri P. Bertsekasy. Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications , 2012 .
[111] Bruno Scherrer. On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes , 2012, ArXiv.
[112] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[113] Frank L. Lewis,et al. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .
[114] Frank L. Lewis,et al. Reinforcement Learning And Approximate Dynamic Programming For Feedback Control , 2016 .
[115] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[116] Dimitri P. Bertsekas,et al. On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems , 2013, Math. Oper. Res..
[117] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.
[118] Uriel G. Rothblum,et al. (Approximate) iterated successive approximations algorithm for sequential decision processes , 2013, Ann. Oper. Res..
[119] Huizhen Yu. Stochastic Shortest Path Games and Q-Learning , 2014, 1412.8570.
[120] Özlem Çavus,et al. Risk-Averse Control of Undiscounted Transient Markov Models , 2012, SIAM J. Control. Optim..
[121] J. Walrand,et al. Distributed Dynamic Programming , 2022 .