Abstract Dynamic Programming

Dynamic Programming Dimitri P. Bertsekas Massachusetts Institute of Technology WWW site for book information and orders http://www.athenasc.com Athena Scientific, Belmont, Massachusetts

[1]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2]  L. G. Mitten Composition Principles for Synthesis of Optimal Multistage Processes , 1964 .

[3]  D. Blackwell Discounted Dynamic Programming , 1965 .

[4]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[5]  L. G. Mitten,et al.  ELEMENTS OF SEQUENTIAL DECISION PROCESSES , 1966 .

[6]  A. F. Veinott ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[7]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[8]  David Blackwell,et al.  Positive dynamic programming , 1967 .

[9]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[10]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[11]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[12]  D. Bertsekas Infinite time reachability of state-space regions by using feedback control , 1972 .

[13]  Rhodes,et al.  Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games , 1973 .

[14]  Manfred SchÄl,et al.  Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[15]  D. Bertsekas Monotone mappings in dynamic programming , 1975, 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes.

[16]  D. Bertsekas Monotone Mappings with Application in Dynamic Programming , 1977 .

[17]  Stanley R. Pliska ON THE TRANSIENT CASE FOR MARKOV DECISION CHAINS WITH GENERAL STATE SPACES , 1978 .

[18]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[19]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[20]  P. Whittle Stability and characterisation conditions in negative programming , 1980, Journal of Applied Probability.

[21]  T. Morin Monotonicity and the principle of optimality , 1982 .

[22]  Stef Tijs,et al.  Fictitious play applied to sequences of games and discounted stochastic games , 1982 .

[23]  M. I. Henig Vector-Valued Dynamic Programming , 1983 .

[24]  Paul J. Schweitzer,et al.  Aggregation Methods for Large Markov Chains , 1983, Computer Performance and Reliability.

[25]  Uriel G. Rothblum,et al.  Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..

[26]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[27]  Rolf van Dawen,et al.  Negative Dynamic Programming , 1984 .

[28]  P. C. Bhakta,et al.  Some existence theorems for functional equations arising in dynamic programming, II , 1984 .

[29]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[30]  Karl-Heinz Waldmann,et al.  On Bounds for Dynamic Programs , 1985, Math. Oper. Res..

[31]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[32]  S. Verdú,et al.  Abstract dynamic programming models under commutativity conditions , 1987 .

[33]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[34]  M. J. Sobel,et al.  Discounted MDP's: distribution functions and exponential utility maximization , 1987 .

[35]  R. Carraway,et al.  Theory and applications of generalized dynamic programming: An overview , 1988 .

[36]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[37]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[38]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[39]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[40]  Anne Condon,et al.  The Complexity of Stochastic Games , 1992, Inf. Comput..

[41]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[42]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[43]  Ronald J. Williams,et al.  Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .

[44]  George H. John When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.

[45]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[46]  S.,et al.  Risk-Sensitive Control and Dynamic Games for Partially Observed Discrete-Time Nonlinear Systems , 1994 .

[47]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[48]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[49]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[50]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[51]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[52]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[53]  W. Fleming,et al.  Risk-Sensitive Control on an Infinite Time Horizon , 1995 .

[54]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[55]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[56]  S. Marcus,et al.  Risk sensitive control of Markov processes in countable state space , 1996 .

[57]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[58]  András Lörincz,et al.  Inverse Dynamics Controllers for Robust Control: Consequences for Neurocontrollers , 1996, ICANN.

[59]  epetivari Ctiaba Sz Some basic facts concerning minimax sequential decision processes , 1996 .

[60]  S. Ioffe,et al.  Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[61]  Leon A. Petrosyan,et al.  Game Theory (Second Edition) , 1996 .

[62]  Matthias Heger The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[63]  Csaba Szepesvári,et al.  Learning and Exploitation Do Not Conflict Under Minimax Optimality , 1997, ECML.

[64]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[65]  Apostolos Burnetas,et al.  Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[66]  T. L. Graves,et al.  Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[67]  András Lörincz,et al.  Neurocontroller using dynamic state feedback for compensatory control , 1997, Neural Networks.

[68]  E. Fernández-Gaucherand,et al.  Risk-sensitive optimal control of hidden Markov models: structural results , 1997, IEEE Trans. Autom. Control..

[69]  Csaba Szepesvari,et al.  Module Based Reinforcement Learning for a Real Robot , 1997 .

[70]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[71]  Csaba Szepesvari Static and Dynamic Aspects of Optimal Sequential Decision Making , 1998 .

[72]  Csaba Szepesvári Non-Markovian Policies in Sequential Decision Problems , 1998, Acta Cybern..

[73]  Steven I. Marcus,et al.  Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[74]  S. P. Meynz,et al.  Risk Sensitive Optimal Control: Existence and Synthesis for Models with Unbounded Cost , 1999 .

[75]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[76]  D. Bertsekas,et al.  Stochastic Shortest Path Games , 1999 .

[77]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[78]  O. Hernández-Lerma,et al.  Markov Control Processes with the Expected Total Cost Criterion: Optimality, Stability, and Transient Models , 1999 .

[79]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[80]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[81]  Eric Rogers,et al.  Uncertainty, performance, and model dependency in approximate adaptive nonlinear control , 2000, IEEE Trans. Autom. Control..

[82]  Stephen D. Patek,et al.  On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..

[83]  Ying He,et al.  Simulation-Based Algorithms for Markov Decision Processes , 2002 .

[84]  Sean P. Meyn,et al.  Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost , 2002, Math. Oper. Res..

[85]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[86]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[87]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[88]  Karl-Heinz Waldmann,et al.  Algorithms for Countable State Markov Decision Models with an Absorbing Set , 2005, SIAM J. Control. Optim..

[89]  E. J. Collins,et al.  An analysis of transient Markov decision processes , 2006, Journal of Applied Probability.

[90]  Stephen D. Patek,et al.  Partially Observed Stochastic Shortest Path Problems With Approximate Solution by Neurodynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[91]  H. Robbins A Stochastic Approximation Method , 1951 .

[92]  D. Bertsekas,et al.  Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods , 2007 .

[93]  Bruno Scherrer,et al.  Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris , 2007 .

[94]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[95]  Vivek S. Borkar,et al.  A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..

[96]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[97]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[98]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.

[99]  D. Bertsekas,et al.  Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .

[100]  Abraham Thomas,et al.  LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 2009 .

[101]  Dimitri P. Bertsekas,et al.  Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..

[102]  B. Scherrer,et al.  Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.

[103]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[104]  B. Scherrer,et al.  Performance bound for Approximate Optimistic Policy Iteration , 2010 .

[105]  Dimitri P. Bertsekas,et al.  Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).

[106]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[107]  Dimitri P. Bertsekasy Williams-Baird Counterexample for Q-Factor Asynchronous Policy Iteration , 2010 .

[108]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[109]  Dimitri P. Bertsekas,et al.  Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.

[110]  Dimitri P. Bertsekasy Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications , 2012 .

[111]  Bruno Scherrer On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes , 2012, ArXiv.

[112]  Huizhen Yu,et al.  Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..

[113]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[114]  Frank L. Lewis,et al.  Reinforcement Learning And Approximate Dynamic Programming For Feedback Control , 2016 .

[115]  Bruno Scherrer,et al.  On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.

[116]  Dimitri P. Bertsekas,et al.  On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems , 2013, Math. Oper. Res..

[117]  Dimitri P. Bertsekas,et al.  Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.

[118]  Uriel G. Rothblum,et al.  (Approximate) iterated successive approximations algorithm for sequential decision processes , 2013, Ann. Oper. Res..

[119]  Huizhen Yu Stochastic Shortest Path Games and Q-Learning , 2014, 1412.8570.

[120]  Özlem Çavus,et al.  Risk-Averse Control of Undiscounted Transient Markov Models , 2012, SIAM J. Control. Optim..

[121]  J. Walrand,et al.  Distributed Dynamic Programming , 2022 .