Feature-based aggregation and deep reinforcement learning: a survey and some new implementations
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[3] A. G. Ivakhnenko,et al. Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..
[4] M. A. Krasnoselʹskii,et al. Approximate Solution of Operator Equations , 1972 .
[5] W. Miranker,et al. Acceleration by aggregation of successive approximation methods , 1982 .
[6] Roy Mendelssohn,et al. An Iterative Aggregation Procedure for Markov Decision Processes , 1982, Oper. Res..
[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[8] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[9] Richard E. Korf,et al. A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.
[10] Robert L. Smith,et al. Aggregation in Dynamic Programming , 1987, Oper. Res..
[11] Gerald Tesauro,et al. Connectionist Learning of Expert Preferences by Comparison Training , 1988, NIPS.
[12] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[13] Gerald Tesauro,et al. Neurogammon Wins Computer Olympiad , 1989, Neural Computation.
[14] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[15] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[16] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[17] Bruce Abramson,et al. Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[18] L. Jones. Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.
[19] James R. Evans,et al. Aggregation and Disaggregation Techniques and Methodology in Optimization , 1991, Oper. Res..
[20] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[21] Timothy Masters,et al. Multilayer Feedforward Networks , 1993 .
[22] J. Douglas,et al. A unified convergence theory for abstract multigrid or multilevel algorithms, serial and parallel , 1993 .
[23] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[24] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[25] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[26] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[27] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[28] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[29] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[30] Gerald Tesauro,et al. TD-Gammon: A Self-Teaching Backgammon Program , 1995 .
[31] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[32] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[33] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[34] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[35] A. Kirsch. An Introduction to the Mathematical Theory of Inverse Problems , 1996, Applied Mathematical Sciences.
[36] John N. Tsitsiklis,et al. Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.
[37] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[38] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[39] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[40] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[41] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[42] Dimitri P. Bertsekas,et al. Rollout Algorithms for Stochastic Scheduling Problems , 1999, J. Heuristics.
[43] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[44] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[45] Gerald Tesauro,et al. Comparison training of chess evaluation functions , 2001 .
[46] Roberto Frias,et al. A brief survey , 2011 .
[47] Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..
[48] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[49] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[50] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[51] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[52] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[53] Dimitri P. Bertsekas,et al. Discretized Approximations for POMDP with Average Cost , 2004, UAI.
[54] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[55] Michael C. Fu,et al. An Adaptive Sampling Algorithm for Solving Markov Decision Processes , 2005, Oper. Res..
[56] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[57] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[58] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[59] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[60] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[61] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[62] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[63] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris , 2007 .
[64] Frank L. Lewis,et al. Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.
[65] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.
[66] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[67] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[68] Shie Mannor,et al. Adaptive Bases for Reinforcement Learning , 2010, ECML/PKDD.
[69] Bart De Schutter,et al. Approximate Dynamic Programming and Reinforcement Learning , 2010, Interactive Collaborative Information Systems.
[70] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[71] Simon Haykin,et al. Neural Networks and Learning Machines , 2010 .
[72] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[73] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[74] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[75] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[76] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[77] Michèle Sebag,et al. Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling , 2012, LION.
[78] Frank L. Lewis,et al. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .
[79] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[80] Dimitri P. Bertsekas,et al. Rollout Algorithms for Discrete Optimization: A Survey , 2012 .
[81] Frank L. Lewis,et al. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .
[82] D. Bertsekas,et al. Weighted Bellman Equations and their Applications in Approximate Dynamic Programming ∗ , 2012 .
[83] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .
[84] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..
[85] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .
[86] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[87] Dimitri P. Bertsekas,et al. Lambda-Policy Iteration: A Review and a New Implementation , 2013, ArXiv.
[88] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[89] David Silver,et al. Value Iteration with Options and State Aggregation , 2015, ArXiv.
[90] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[91] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[92] Dimitri P. Bertsekas,et al. Convex Optimization Algorithms , 2015 .
[93] Nathan S. Netanyahu,et al. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.
[94] Dimitri P. Bertsekas. Proximal Algorithms and Temporal Differences for Large Linear Systems: Extrapolation, Approximation, and Simulation , 2016, ArXiv.
[95] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[96] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[97] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[98] Yurong Liu,et al. A survey of deep neural network architectures and their applications , 2017, Neurocomputing.
[99] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[100] Dimitri P. Bertsekas,et al. Proximal algorithms and temporal difference methods for solving fixed point problems , 2018, Comput. Optim. Appl..
[101] Yuxi Li,et al. Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.
[102] Joelle Pineau,et al. The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach , 2018, J. Artif. Intell. Res..