Feature Selection for Neuro-Dynamic Programming
暂无分享,去创建一个
Sean P. Meyn | P. Mehta | A. Surana | Dayu Huang | Wei Chen
[1] C. Watkins. Learning from delayed rewards , 1989 .
[2] Vivek S. Borkar,et al. Optimal Control of Diffusion Processes , 1989 .
[3] Lawrence M. Wein,et al. Dynamic Scheduling of a Multiclass Make-to-Stock Queue , 2015, Oper. Res..
[4] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[5] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[8] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[9] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[10] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[11] Vivek S. Borkar,et al. Convex Analytic Methods in Markov Decision Processes , 2002 .
[12] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[13] Sean P. Meyn,et al. Performance Evaluation and Policy Selection in Multiclass Networks , 2003, Discret. Event Dyn. Syst..
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[16] Benjamin Van Roy,et al. An approximate dynamic programming approach to decentralized control of stochastic systems , 2006 .
[17] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[18] Minyi Huang,et al. Large-Population Cost-Coupled LQG Problems With Nonuniform Agents: Individual-Mass Behavior and Decentralized $\varepsilon$-Nash Equilibria , 2007, IEEE Transactions on Automatic Control.
[19] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[20] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[21] Sean P. Meyn,et al. Shannon meets Bellman: Feature based Markovian models for detection and optimization , 2008, 2008 47th IEEE Conference on Decision and Control.
[22] Sean P. Meyn,et al. Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[23] Sean P. Meyn,et al. A DYNAMIC NEWSBOY MODEL FOR OPTIMAL RESERVE MANAGEMENT IN ELECTRICITY MARKETS , 2009 .
[24] Adam Wierman,et al. Approximate dynamic programming using fluid and diffusion approximations with applications to power management , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[25] Vivek S. Borkar,et al. A New Learning Algorithm for Optimal Stopping , 2009, Discret. Event Dyn. Syst..
[26] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..
[27] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.