TD-learning with exploration
暂无分享,去创建一个
[1] J. Michael Harrison,et al. Dynamic Control of a Queue with Adjustable Service Rate , 2001, Oper. Res..
[2] Sean P. Meyn,et al. Optimal cross-layer wireless control policies using TD learning , 2010, 49th IEEE Conference on Decision and Control (CDC).
[3] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[4] A. Wierman,et al. Optimality, fairness, and robustness in speed scaling designs , 2010, SIGMETRICS '10.
[5] Minyi Huang,et al. Large-Population Cost-Coupled LQG Problems With Nonuniform Agents: Individual-Mass Behavior and Decentralized $\varepsilon$-Nash Equilibria , 2007, IEEE Transactions on Automatic Control.
[6] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[7] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[8] E. Nummelin. General irreducible Markov chains and non-negative operators: List of symbols and notation , 1984 .
[9] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[10] Sean P. Meyn,et al. Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[11] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] S. Meyn,et al. Large Deviations Asymptotics and the Spectral Theory of Multiplicatively Regular Markov Processes , 2005, math/0509310.
[14] S. Meyn,et al. Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.
[15] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[16] Eugene A. Feinberg,et al. Handbook of Markov Decision Processes , 2002 .
[17] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .
[18] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[19] Sunil Kumar,et al. Decision , Risk & Operations Working Papers Series Approximate and Data-Driven Dynamic Programming for Queueing Networks , 2008 .
[20] Sean P. Meyn,et al. Quasi stochastic approximation , 2011, Proceedings of the 2011 American Control Conference.
[21] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .
[22] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[23] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[24] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[25] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[26] Adam Wierman,et al. Approximate dynamic programming using fluid and diffusion approximations with applications to power management , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[27] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[28] Anant Sahai,et al. Towards a Communication-Theoretic Understanding of System-Level Power Consumption , 2010, IEEE Journal on Selected Areas in Communications.
[29] Sean P. Meyn,et al. Feature Selection for Neuro-Dynamic Programming , 2013 .
[30] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[31] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[32] M. Veatch. Approximate Dynamic Programming for Networks : Fluid Models and Constraint Reduction , 2004 .
[33] S. Meyn,et al. Multiplicative ergodicity and large deviations for an irreducible Markov chain , 2000 .
[34] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[35] Benjamin Van Roy,et al. An approximate dynamic programming approach to decentralized control of stochastic systems , 2006 .
[36] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[37] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[38] E. Nummelin. General irreducible Markov chains and non-negative operators: Positive and null recurrence , 1984 .
[39] S. Balajia,et al. Multiplicative ergodicity and large deviations for an irreducible Markov chain ( , 2022 .