Dynamic Programming and Optimal Control 3rd Edition, Volume II
暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] Z. Rekasius,et al. Suboptimal design of intentionally nonlinear controllers , 1964 .
[3] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[4] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[5] Charlotte Striebel,et al. Optimal Control of Discrete Time Stochastic Systems , 1975 .
[6] D. Bertsekas. Monotone mappings in dynamic programming , 1975, 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes.
[7] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[8] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .
[9] Uriel G. Rothblum,et al. Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..
[10] George N. Saridis,et al. An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[11] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[12] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[13] Dimitri P. Bertsekas,et al. Distributed asynchronous computation of fixed points , 1983, Math. Program..
[14] Uriel G. Rothblum,et al. Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..
[15] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[16] C. Watkins. Learning from delayed rewards , 1989 .
[17] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[18] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[19] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[20] Richard W. Cottle,et al. Linear Complementarity Problem. , 1992 .
[21] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[22] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[23] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[24] Michael C. Fu,et al. Smoothed perturbation analysis derivative estimation for Markov chains , 1994, Oper. Res. Lett..
[25] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[26] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[27] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[28] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[29] Eugene A. Feinberg,et al. Markov Decision Models with Weighted Discounted Criteria , 1994, Math. Oper. Res..
[30] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[31] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[32] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[33] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995 .
[34] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[35] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[36] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[37] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[38] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[39] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.
[40] Fernando J. Pineda,et al. Mean-Field Theory for Batched TD() , 1997, Neural Computation.
[41] Wenju Liu,et al. A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..
[42] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[43] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[44] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[45] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[46] L. Trefethen,et al. Numerical linear algebra , 1997 .
[47] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[48] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[49] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[50] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[51] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[52] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[53] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[54] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[55] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[56] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[57] Jonathan Baxter,et al. Reinforcement Learning From State and Temporal Differences , 1999 .
[58] X. Cao,et al. Single Sample Path-Based Optimization of Markov Chains , 1999 .
[59] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[60] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[61] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[62] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[63] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[64] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[65] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[66] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.
[67] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[68] Stephen D. Patek,et al. On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..
[69] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[70] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[71] Jun S. Liu,et al. Monte Carlo strategies in scientific computing , 2001 .
[72] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[73] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[74] Eugene A. Feinberg,et al. Total Reward Criteria , 2002 .
[75] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[76] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[77] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[78] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[79] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[80] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.
[81] Hagai Attias,et al. Planning by Probabilistic Inference , 2003, AISTATS.
[82] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[83] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[84] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[85] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[86] John N. Tsitsiklis,et al. On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.
[87] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[88] Abhijit Gosavi,et al. Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..
[89] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[90] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[91] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[92] A. Barto,et al. ModelBased Adaptive Critic Designs , 2004 .
[93] Xi-Ren Cao. Learning and Optimization: From a System Theoretic Perspective , 2004 .
[94] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[95] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[96] A. Barto,et al. Reinforcement Learning in Large, High‐Dimensional State Spaces , 2004 .
[97] Derong Liu,et al. Direct Neural Dynamic Programming , 2004 .
[98] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[99] Jennie Si,et al. Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .
[100] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[101] Dimitri P. Bertsekas,et al. Discretized Approximations for POMDP with Average Cost , 2004, UAI.
[102] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[103] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[104] Xi-Ren Cao,et al. A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.
[105] Ying He,et al. A Two-Timescale Simulation-Based Gradient Algorithm for Weighted Cost Markov Decision Processes , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[106] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[107] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.
[108] D. Bertsekas. Rollout Algorithms for Constrained Dynamic Programming 1 , 2005 .
[109] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[110] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[111] Huizhen Yu,et al. A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies , 2005, UAI.
[112] Warren B. Powell,et al. Approximate dynamic programming for high dimensional resource allocation problems , 2005 .
[113] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[114] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[115] Uriel G. Rothblum,et al. A Turnpike Theorem For A Risk-Sensitive Markov Decision Process with Stopping , 2006, SIAM J. Control. Optim..
[116] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[117] Rajesh P. N. Rao,et al. Planning and Acting in Uncertain Environments using Probabilistic Inference , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[118] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[119] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[120] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .
[121] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..
[122] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[123] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[124] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[125] Dirk P. Kroese,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[126] D. Bertsekas,et al. Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods , 2007 .
[127] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration , 2007, ArXiv.
[128] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[129] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[130] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[131] Frank L. Lewis,et al. Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.
[132] Dimitri P. Bertsekas,et al. On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP , 2008, Math. Oper. Res..
[133] Jonathan P. How,et al. Approximate dynamic programming using support vector regression , 2008, 2008 47th IEEE Conference on Decision and Control.
[134] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[135] Zhi-Qiang Liu,et al. Preconditioned temporal difference learning , 2008, ICML '08.
[136] Dimitri P. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[137] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[138] Ioannis Ch. Paschalidis,et al. An actor-critic method using Least Squares Temporal Difference learning , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[139] Vivek S. Borkar,et al. Reinforcement Learning — A Bridge Between Numerical Methods and Monte Carlo , 2009 .
[140] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[141] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[142] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[143] D. Bertsekas,et al. Approximate Solution of Large-Scale Linear Inverse Problems with Monte Carlo Simulation ∗ , 2009 .
[144] Dale Schuurmans,et al. Learning Exercise Policies for American Options , 2009, AISTATS.
[145] D. Bertsekas. Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .
[146] Warren B. Powell,et al. An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application , 2009, Transp. Sci..
[147] Dimitri P. Bertsekas,et al. Convex Optimization Theory , 2009 .
[148] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[149] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[150] Dimitri P. Bertsekas,et al. Distributed asynchronous policy iteration in dynamic programming , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[151] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[152] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[153] Bart De Schutter,et al. Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.
[154] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[155] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[156] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[157] Simon Haykin,et al. Neural Networks and Learning Machines , 2010 .
[158] Dimitri P. Bertsekas,et al. Pathologies of temporal difference methods in approximate dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[159] Csaba Szepesvári,et al. Reinforcement Learning Algorithms for MDPs , 2011 .
[160] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[161] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[162] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[163] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[164] Vivek F. Farias,et al. Approximate Dynamic Programming via a Smoothed Linear Program , 2009, Oper. Res..
[165] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.