Model-building semi-Markov adaptive critics
暂无分享,去创建一个
[1] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] Emmanuel Fernandez,et al. Control of a re-entrant line manufacturing model with a reinforcement learning approach , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).
[4] Abhijit Gosavi,et al. Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..
[5] V. Borkar. Stochastic approximation with two time scales , 1997 .
[6] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[7] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[8] Shin Ishii,et al. A model-based reinforcement learning: a computational model and an fMRI study , 2003, ESANN.
[9] Jürgen Schmidhuber,et al. Model-based reinforcement learning for evolving soccer strategies , 2001 .
[10] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
[11] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[12] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[13] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[14] Tapas K. Das,et al. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .
[15] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[16] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[17] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[18] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.
[19] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[20] Abhijit Gosavi,et al. Model-Building for Robust Reinforcement Learning , 2010 .
[21] A. Barto,et al. ModelBased Adaptive Critic Designs , 2004 .
[22] Pieter Abbeel,et al. Autonomous Autorotation of an RC Helicopter , 2008, ISER.
[23] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[24] R. Bellman. Dynamic programming. , 1957, Science.
[25] Abhijit Gosavi. Reinforcement learning for model building and variance-penalized control , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).
[26] Abhijit Gosavi,et al. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.
[27] Abhijit Gosavi,et al. Semi-Markov adaptive critic heuristics with application to airline revenue management , 2011 .
[28] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[29] Shalabh Bhatnagar,et al. Actor-critic algorithms for hierarchical Markov decision processes , 2006, Autom..
[30] Abhijit Gosavi. Adaptive Critics for Airline Revenue Management , 2007 .
[31] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .
[32] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[33] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.
[34] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[35] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[36] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[37] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[38] Mala Gosakan,et al. Human performance modeling for emergency management decision making , 2010 .
[39] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[40] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[41] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .