暂无分享,去创建一个
[1] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[2] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .
[3] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[4] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[5] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[6] Vicenç Gómez,et al. Fast rates for online learning in Linearly Solvable Markov Decision Processes , 2017, COLT.
[7] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[8] Kavosh Asadi,et al. A New Softmax Operator for Reinforcement Learning , 2016, ArXiv.
[9] Stefan Schaal,et al. Path integral control and bounded rationality , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[10] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[11] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[14] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[15] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.
[16] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..
[17] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[18] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[19] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[20] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[21] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[22] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[23] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[26] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[27] Marek Petrik,et al. An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.
[28] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[29] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[30] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[31] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.
[32] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[33] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[34] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[35] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[36] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[37] András György,et al. Online Learning in Markov Decision Processes with Changing Cost Sequences , 2014, ICML.
[38] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[39] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[40] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[41] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[42] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[43] H. Brendan McMahan,et al. A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..
[44] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.
[45] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[46] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[47] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[48] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[49] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[50] N. Roy,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .
[51] Vicenç Gómez,et al. Dynamic Policy Programming with Function Approximation , 2011, AISTATS.
[52] B. Martinet. Perturbation des méthodes d'optimisation. Applications , 1978 .
[53] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.