Variance-constrained actor-critic algorithms for discounted and average reward MDPs
暂无分享,去创建一个
[1] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[2] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[3] Vivek S. Borkar,et al. A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..
[4] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[5] Shie Mannor,et al. Temporal Difference Methods for the Variance of the Reward To Go , 2013, ICML.
[6] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] E. Altman. Constrained Markov Decision Processes , 1999 .
[10] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[11] Shalabh Bhatnagar,et al. Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.
[12] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[13] Vivek S. Borkar,et al. Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..
[14] Shalabh Bhatnagar,et al. Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization , 2007, TOMC.
[15] Michael Devetsikiotis,et al. An adaptive approach to accelerated evaluation of highly available services , 2007, TOMC.
[16] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[17] Ilya Segal,et al. Solutions manual for Microeconomic theory : Mas-Colell, Whinston and Green , 1997 .
[18] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[19] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[20] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[21] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[22] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[23] G. Rappl. On Linear Convergence of a Class of Random Search Algorithms , 1989 .
[24] V. Borkar. Learning Algorithms for Risk-Sensitive Control , 2010 .
[25] Shie Mannor,et al. Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes , 2013, ArXiv.
[26] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[27] Shalabh Bhatnagar,et al. Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization , 2005, TOMC.
[28] A. Koopman,et al. Simulation and optimization of traffic in a city , 2004, IEEE Intelligent Vehicles Symposium, 2004.
[29] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[30] Michael C. Fu,et al. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.
[31] Shie Mannor,et al. Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..
[32] Klaus Obermayer,et al. A Unified Framework for Risk-sensitive Markov Decision Processes with Finite State and Action Spaces , 2011, ArXiv.
[33] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[34] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[35] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[36] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[37] Philip E. Gill,et al. Practical optimization , 1981 .
[38] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[39] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[40] Jack L. Treynor,et al. MUTUAL FUND PERFORMANCE* , 2007 .
[41] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[42] John N. Tsitsiklis,et al. Algorithmic aspects of mean-variance optimization in Markov decision processes , 2013, Eur. J. Oper. Res..
[43] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[44] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..
[45] Shalabh Bhatnagar. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[46] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[47] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[48] J. Dippon,et al. Weighted Means in Stochastic Approximation of Minima , 1997 .
[49] Paul R. Milgrom,et al. Envelope Theorems for Arbitrary Choice Sets , 2002 .
[50] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[51] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[52] Shie Mannor,et al. Variance Adjusted Actor Critic Algorithms , 2013, ArXiv.
[53] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[54] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[55] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[56] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[57] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[58] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[59] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[60] M. Sion. On general minimax theorems , 1958 .
[61] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..
[62] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[63] F. Downton. Stochastic Approximation , 1969, Nature.
[64] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[65] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.
[66] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[67] M. T. Wasan. Stochastic Approximation , 1969 .
[68] Shalabh Bhatnagar,et al. Stochastic Recursive Algorithms for Optimization , 2012 .
[69] J. Spall. Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[70] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .
[71] Shalabh Bhatnagar,et al. Threshold Tuning Using Stochastic Optimization for Graded Signal Control , 2012, IEEE Transactions on Vehicular Technology.
[72] A. Mas-Colell,et al. Microeconomic Theory , 1995 .
[73] M. A. Styblinski,et al. Algorithms and Software Tools for IC Yield Optimization Based on Fundamental Fabrication Parameters , 1986, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[74] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[75] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .
[76] Shalabh Bhatnagar,et al. Stochastic approximation algorithms for constrained optimization via simulation , 2011, TOMC.
[77] Han-Fu Chen,et al. A Kiefer-Wolfowitz algorithm with randomized differences , 1999, IEEE Trans. Autom. Control..