Markov Decision Processes
暂无分享,去创建一个
[1] L. Shapley. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] R. Bellman. The theory of dynamic programming , 1954 .
[3] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[4] K. Miyasawa. AN ECONOMIC SURVIVAL GAME , 1961 .
[5] D. Blackwell. Discounted Dynamic Programming , 1965 .
[6] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .
[7] David Lindley,et al. How to Gamble If You Must (Inequalities for Stochastic Processes) , 1966 .
[8] K. Hinderer,et al. Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .
[9] W. K. Hastings. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[10] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[11] J. K. Satia,et al. Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..
[12] P. Moerbeke. On optimal stopping and free boundary problems , 1976 .
[13] Moshe Ben-Horim,et al. A linear programming approach , 1977 .
[14] Richard Grinold. Finite horizon approximations of infinite horizon linear programs , 1977, Math. Program..
[15] Evan L. Porteus. Conditions for characterizing the structure of optimal strategies in infinite-horizon dynamic programs , 1982 .
[16] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[17] Paul Bratley,et al. A guide to simulation , 1983 .
[18] John Rust. Structural estimation of markov decision processes , 1986 .
[19] L. Devroye. Non-Uniform Random Variate Generation , 1986 .
[20] J. Lasserre,et al. An on-line procedure in discounted infinite-horizon stochastic optimal control , 1986 .
[21] E. Gilbert,et al. Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and moving-horizon approximations , 1988 .
[22] M. K rn,et al. Stochastic Optimal Control , 1988 .
[23] O. Hernández-Lerma,et al. A forecast horizon and a stopping rule for general Markov decision processes , 1988 .
[24] D. Yao,et al. Stochastic monotonicity in general queueing networks , 1989 .
[25] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[26] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[27] O. Hernández-Lerma. Adaptive Markov Control Processes , 1989 .
[28] O. Hernández-Lerma,et al. Error bounds for rolling horizon policies in discrete-time Markov control processes , 1990 .
[29] D. Mayne,et al. Receding horizon control of nonlinear systems , 1990 .
[30] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[31] Steven I. Marcus,et al. On the computation of the optimal cost function for discrete time Markov models with partial observations , 1991, Ann. Oper. Res..
[32] Ari Arapostathis,et al. On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..
[33] Harald Niederreiter,et al. Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.
[34] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[35] Shaler Stidham,et al. A survey of Markov decision models for control of networks of queues , 1993, Queueing Syst. Theory Appl..
[36] V. Borkar. White-noise representations in stochastic realization theory , 1993 .
[37] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[38] Anders Martin-Löf. Lectures on the use of control theory in insurance , 1994 .
[39] Chelsea C. White,et al. Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..
[40] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994, Wiley Series in Probability and Statistics.
[41] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[42] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[43] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control , 1995 .
[44] V. Rykov,et al. Controlled Queueing Systems , 1995 .
[45] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[46] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[47] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[48] Awi Federgruen,et al. Detection of minimal forecast horizons in dynamic programs with multiple indicators of the future , 1996 .
[49] C. Moorehead. All rights reserved , 1997 .
[50] W. N. Patten,et al. A sliding horizon feedback control problem with feedforward and disturbance , 1997 .
[51] E. Altman,et al. On submodular value functions and complex dynamic programming , 1998 .
[52] Masanori Hosaka,et al. CONTROLLED MARKOV SET-CHAINS WITH DISCOUNTING , 1998 .
[53] L. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .
[54] Steven I. Marcus,et al. Simulation-Based Algorithms for Average Cost Markov Decision Processes , 1999 .
[55] O. Hernández-Lerma,et al. Discrete-time Markov control processes , 1999 .
[56] E. Altman. Constrained Markov Decision Processes , 1999 .
[57] Jay H. Lee,et al. Model predictive control: past, present and future , 1999 .
[58] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[59] Thomas Parisini,et al. Neural approximators and team theory for dynamic routing: a receding-horizon approach , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[60] Daniel Hernández-Hernández,et al. Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management , 1999, Math. Methods Oper. Res..
[61] M. Kukhanova,et al. Biographical Sketch , 2000, Nucleosides, nucleotides & nucleic acids.
[62] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[63] John N. Tsitsiklis,et al. A survey of computational complexity results in systems and control , 2000, Autom..
[64] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[65] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[66] S. Marcus,et al. A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes , 2000 .
[67] Michael C. Fu,et al. Monotone Optimal Policies for a Transient Queueing Staffing Problem , 2000, Oper. Res..
[68] Renaud Lecoeuche. Learning Optimal Dialogue Management Rules by Using Reinforcement Learning and Inductive Logic Programming , 2001, NAACL.
[69] Kurt Driessens,et al. Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.
[70] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[71] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.
[72] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[73] Vivek S. Borkar,et al. Convex Analytic Methods in Markov Decision Processes , 2002 .
[74] Benjamin Van Roy. Neuro-Dynamic Programming: Overview and Recent Trends , 2002 .
[75] Suresh P. Sethi,et al. Forecast, Solution, and Rolling Horizons in Operations Management Problems: A Classified Bibliography , 2002, Manuf. Serv. Oper. Manag..
[76] Eugene A. Feinberg,et al. Handbook of Markov Decision Processes , 2002 .
[77] W. A. van den Broek. Moving horizon control in dynamic games , 2002 .
[78] James E. Smith,et al. Structural Properties of Stochastic Dynamic Programs , 2002, Oper. Res..
[79] Sean P. Meyn,et al. Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost , 2002, Math. Oper. Res..
[80] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.
[81] Carlos Guestrin,et al. Generalizing Plans to New Environments in Relational MDPs , 2003, IJCAI.
[82] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[83] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[84] H. Föllmer,et al. American Options, Multi–armed Bandits, and Optimal Consumption Plans: A Unifying View , 2003 .
[85] Abhijit Gosavi,et al. Simulation-Based Optimization , 2003 .
[86] Kurt Driessens,et al. Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.
[87] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[88] Luc De Raedt,et al. Logical Markov Decision Programs , 2003 .
[89] Henk C. Tijms. A First Course in Stochastic Models: Tijms/Stochastic Models , 2003 .
[90] VIJAY R. KONDA,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[91] L. Kallenberg. Finite State and Action MDPS , 2003 .
[92] William L. Cooper,et al. CONVERGENCE OF SIMULATION-BASED POLICY ITERATION , 2003, Probability in the Engineering and Informational Sciences.
[93] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[94] Martin L. Puterman,et al. Coffee, Tea, or ...?: A Markov Decision Process Model for Airline Meal Provisioning , 2004, Transp. Sci..
[95] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1997, Machine Learning.
[96] Ness B. Shroff,et al. MARKOV DECISION PROCESSES WITH UNCERTAIN TRANSITION RATES: SENSITIVITY AND MAX HYPHEN MIN CONTROL , 2008 .
[97] Manfred Schäl. On Discrete-Time Dynamic Programming in Insurance: Exponential Utility and Minimizing the Ruin Probability , 2004 .
[98] Haitao Fang,et al. Potential-based online policy iteration algorithms for Markov decision processes , 2004, IEEE Trans. Autom. Control..
[99] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning , 2004, Machine Learning.
[100] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[101] Peter Dayan,et al. Technical Note: Q-Learning , 1992, Machine Learning.
[102] M. van Otterlo. Reinforcement Learning for Relational MDPs , 2004 .
[103] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.
[104] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[105] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.
[106] Casey A. Volino. A First Course in Stochastic Models , 2005, Technometrics.
[107] A. Willsky,et al. Importance sampling actor-critic algorithms , 2006, 2006 American Control Conference.
[108] Thomas Gärtner,et al. Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.
[109] Alʹbert Nikolaevich Shiri︠a︡ev,et al. Optimal Stopping and Free-Boundary Problems , 2006 .
[110] Pravin Varaiya,et al. Simulation-based Uniform Value Function Estimates of Markov Decision Processes , 2006, SIAM J. Control. Optim..
[111] Sean P. Meyn. Control Techniques for Complex Networks , 2007 .
[112] K. Ortolon. Economic survival. , 2007, Texas medicine.
[113] Michel Denuit,et al. Association and heterogeneity of insured lifetimes in the Lee–Carter framework , 2007 .
[114] Edwin K. P. Chong,et al. Solving Controlled Markov Set-Chains With Discounting via Multipolicy Improvement , 2007, IEEE Transactions on Automatic Control.
[115] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[116] Alfredo García,et al. A Decentralized Approach to Discrete Optimization via Simulation: Application to Network Flow , 2007, Oper. Res..
[117] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[118] Hyeong Soo Chang. Finite-Step Approximation Error Bounds for Solving Average-Reward-Controlled Markov Set-Chains , 2008, IEEE Transactions on Automatic Control.
[119] Alfredo García,et al. A Game-Theoretic Approach to Efficient Power Management in Sensor Networks , 2008, Oper. Res..
[120] Xianping Guo,et al. Continuous-Time Markov Decision Processes: Theory and Applications , 2009 .
[121] Hyeong Soo Chang. Decentralized Learning in Finite Markov Chains: Revisited , 2009, IEEE Transactions on Automatic Control.
[122] Michael Taksar. Stochastic Control in Insurance , 2010 .
[123] Warren B. Powell,et al. Optimal control of dosage decisions in controlled ovarian hyperstimulation , 2010, Ann. Oper. Res..
[124] Warren B. Powell,et al. A dynamic model for the failure replacement of aging high-voltage transformers , 2010 .
[125] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2010, Encyclopedia of Machine Learning.
[126] P. Schrimpf. Dynamic Programming , 2011 .
[127] U. Rieder,et al. Markov Decision Processes with Applications to Finance , 2011 .
[128] Patrick L. Ode. How to Gamble if You Must: Inequalities for Stochastic Processes , 2012 .
[129] De,et al. Relational Reinforcement Learning , 2022 .