Insights in reinforcement rearning : formal analysis and empirical evaluation of temporal-difference learning algorithms
暂无分享,去创建一个
[1] J. Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .
[2] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .
[3] S. Banach. Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales , 1922 .
[4] E. S. Pearson,et al. ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE PART I , 1928 .
[5] M. Kendall. Statistical Methods for Research Workers , 1937, Nature.
[6] K. Arrow. A Difficulty in the Concept of Social Welfare , 1950, Journal of Political Economy.
[7] K. Arrow,et al. Social Choice and Individual Values , 1951 .
[8] Alfred De Grazia,et al. Mathematical Derivation of an Election System , 1953 .
[9] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[10] C. Coombs. A theory of data. , 1965, Psychology Review.
[11] C. E. Clark. The Greatest of a Finite Set of Random Variables , 1961 .
[12] G. Thompson,et al. The Theory of Committees and Elections. , 1959 .
[13] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[14] John H. Holland,et al. Outline for a Logical Theory of Adaptive Systems , 1962, JACM.
[15] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[16] S. Gupta,et al. ORDER STATISTICS ARISING FROM INDEPENDENT BINOMIAL POPULATIONS , 1967 .
[17] Arthur E. Bryson,et al. Applied Optimal Control , 1969 .
[18] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[19] Evan L. Porteus. Some Bounds for Discounted Sequential Decision Processes , 1971 .
[20] E. C. Capen,et al. Competitive Bidding in High-Risk Situations , 1971 .
[21] J. Albus. A Theory of Cerebellar Function , 1971 .
[22] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[23] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[24] J. H. Smith. AGGREGATION OF PREFERENCES WITH VARIABLE ELECTORATE , 1973 .
[25] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[26] Paramesh Ray. Independence of Irrelevant Alternatives , 1973 .
[27] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[28] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[29] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[30] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[31] John A. Hartigan,et al. Clustering Algorithms , 1975 .
[32] Jo van Nunen,et al. A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..
[33] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[34] K. Appel,et al. Every planar map is four colorable. Part II: Reducibility , 1977 .
[35] P. Fishburn. Condorcet Social Choice Functions , 1977 .
[36] J. Wal. Discounted Markov games: Generalized policy iteration method , 1978 .
[37] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[38] B. Arnold,et al. Bounds on Expectations of Linear Systematic Statistics Based on Dependent Samples , 1979 .
[39] Philip D. Straffin,et al. Topics in the theory of voting , 1980 .
[40] Peter C. Fishburn,et al. Monotonicity paradoxes in the theory of elections , 1982, Discret. Appl. Math..
[41] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[42] Scott Kirkpatrick,et al. Optimization by simulated annealing: Quantitative studies , 1984 .
[43] R. Niemi. The Problem of Strategic Behavior under Approval Voting , 1984, American Political Science Review.
[44] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[45] T. Aven. Upper (lower) bounds on the mean of the maximum (minimum) of a number of random variables , 1985, Journal of Applied Probability.
[46] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[47] Barak A. Pearlmutter,et al. G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .
[48] Some philosophical problems from the standpoint of ai , 1987 .
[49] T. Tideman,et al. Independence of clones as a criterion for voting rules , 1987 .
[50] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[51] Bernard Widrow,et al. Adaptive switching circuits , 1988 .
[52] D. Saari,et al. The problem of indeterminacy in approval, multiple, and truncated voting systems , 1988 .
[53] R. Thaler. Anomalies: The Winner's Curse , 1988 .
[54] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[55] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[56] C. Watkins. Learning from delayed rewards , 1989 .
[57] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[58] P. J. Werbos,et al. Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.
[59] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[60] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[61] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[62] Lawrence. Davis,et al. Handbook Of Genetic Algorithms , 1990 .
[63] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[64] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[65] Daniel C. Dennett,et al. Cognitive Wheels: The Frame Problem of AI , 1990, The Philosophy of Artificial Intelligence.
[66] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.
[67] Alex Pentland,et al. Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[68] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..
[69] A. P. Wieland,et al. Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[70] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[71] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[72] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[73] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.
[74] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[75] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[76] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[77] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[78] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[79] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[80] Lambert Schomaker,et al. Using stroke- or character-based self-organizing maps in the recognition of on-line, connected cursive script , 1993, Pattern Recognit..
[81] Thomas Bäck,et al. An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.
[82] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[83] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[84] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[85] Sargur N. Srihari,et al. Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..
[86] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).
[87] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[88] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[89] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[90] Cullen Schaffer,et al. A Conservation Law for Generalization Performance , 1994, ICML.
[91] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[92] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[93] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[94] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[95] Neil J. Calkin. A curious binomial identity , 1994, Discret. Math..
[96] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[97] N. Papadatos. Maximum variance of order statistics , 1995 .
[98] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[99] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..
[100] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[101] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[102] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .
[103] D. Wolpert,et al. No Free Lunch Theorems for Search , 1995 .
[104] N. Tideman. The Single Transferable Vote , 1995 .
[105] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[106] James Kennedy,et al. Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.
[107] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[108] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[109] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[110] Arthur C. Graesser,et al. Is it an Agent, or Just a Program?: A Taxonomy for Autonomous Agents , 1996, ATAL.
[111] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[112] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[113] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[114] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[115] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .
[116] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[117] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[118] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 1996, Machine Learning.
[119] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[120] Claude F. Touzet,et al. Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..
[121] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[122] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[123] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[124] G. Saridis,et al. Approximate Solutions to the Time-Invariant Hamilton–Jacobi–Bellman Equation , 1998 .
[125] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[126] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[127] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[128] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[129] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .
[130] Robin R. Murphy,et al. Artificial intelligence and mobile robots: case studies of successful robot systems , 1998 .
[131] T. Kohonen,et al. Visual Explorations in Finance with Self-Organizing Maps , 1998 .
[132] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[133] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[134] Ron Sun,et al. Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.
[135] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[136] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[137] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[138] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[139] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[140] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[141] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[142] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.
[143] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.
[144] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[145] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[146] R. Richie,et al. Instant Runoffs: A Cheaper, Fairer, Better Way to Conduct Elections , 2000 .
[147] Josef Kittler,et al. Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..
[148] Lambert Schomaker,et al. Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .
[149] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[150] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[151] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[152] D. Farrell. Electoral Systems: A Comparative Introduction , 2001 .
[153] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[154] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .
[155] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[156] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[157] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.
[158] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.
[159] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[160] Tony R. Martinez,et al. The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.
[161] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[162] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[163] A. E. Eiben,et al. Introduction to Evolutionary Computing , 2003, Natural Computing Series.
[164] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[165] L. D. Whitley. Genetic reinforcement learning for neurocontrol problems , 1993, Machine Learning.
[166] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[167] B. Grofman,et al. If you like the alternative vote (a.k.a. the instant runoff), then you ought to know about the Coombs rule , 2004 .
[168] D. Ernst,et al. Power systems stability control: reinforcement learning framework , 2004, IEEE Transactions on Power Systems.
[169] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[170] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[171] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[172] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[173] E. Steen. Rational Overoptimism (and Other Biases) , 2004 .
[174] Jürgen Schmidhuber,et al. Fast Online Q(λ) , 1998, Machine Learning.
[175] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[176] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[177] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[178] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[179] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[180] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[181] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[182] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[183] Axel Wismüller,et al. Tumor feature visualization with unsupervised learning , 2005, Medical Image Anal..
[184] Peter Stone,et al. Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.
[185] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[186] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[187] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.
[188] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[189] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[190] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[191] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[192] Marco Wiering. QV(λ)-learning: A New On-policy Reinforcement Learning Algorithm , 2005 .
[193] Mohammad Bagher Naghibi-Sistani,et al. Application of Q-learning with temperature variation for bidding strategies in market based power systems , 2006 .
[194] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[195] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..
[196] L. Buşoniu. Evolutionary function approximation for reinforcement learning , 2006 .
[197] Shimon Whiteson,et al. Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.
[198] Mohamed S. Kamel,et al. Aggregation of Reinforcement Learning Algorithms , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.
[199] Kart Hik Nat,et al. TIGHT BOUNDS ON EXPECTED ORDER STATISTICS , 2006 .
[200] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[201] Shimon Whiteson,et al. Empirical Studies in Action Selection with Reinforcement Learning , 2007, Adapt. Behav..
[202] H. Robbins. A Stochastic Approximation Method , 1951 .
[203] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[204] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[205] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[206] George E. Monahan,et al. A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .
[207] Deepayan Chakrabarti,et al. Multi-armed bandit problems with dependent arms , 2007, ICML '07.
[208] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[209] M.A. Wiering,et al. Convergence of Model-Based Temporal Difference Learning for Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[210] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[211] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[212] M.A. Wiering,et al. Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[213] Kary Främling. Replacing eligibility trace for action-value learning with function approximation , 2007, ESANN.
[214] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[215] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[216] K. Conn,et al. Towards Affect-sensitive Assistive Intervention Technologies for Children with Autism , 2008, RO-MAN 2008.
[217] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.
[218] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[219] B. Arnold,et al. A first course in order statistics , 2008 .
[220] Changchun Liu,et al. Online Affect Detection and Robot Behavior Adaptation for Intervention of Children With Autism , 2008, IEEE Transactions on Robotics.
[221] Frank Dignum,et al. On-line adapting games using agent organizations , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.
[222] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[223] Christian Igel,et al. Evolution Strategies for Direct Policy Search , 2008, PPSN.
[224] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[225] Marc Schoenauer,et al. Supervised and Evolutionary Learning of Echo State Networks , 2008, PPSN.
[226] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.
[227] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.
[228] Marco Wiering,et al. The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[229] Frank Dignum,et al. Adaptive Serious Games Using Agent Organizations , 2009, AGS.
[230] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[231] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..
[232] Lihong Li,et al. Workshop summary: Results of the 2009 reinforcement learning competition , 2009, ICML '09.
[233] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[234] Marco Wiering,et al. Using continuous action spaces to solve discrete problems , 2009, 2009 International Joint Conference on Neural Networks.
[235] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[236] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.
[237] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[238] Michail G. Lagoudakis,et al. Binary action search for learning continuous-action control policies , 2009, ICML '09.
[239] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.
[240] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[241] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[242] Steffen Udluft,et al. Ensembles of Neural Networks for Robust Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.
[243] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.
[244] Andrew M. Ross. Computing Bounds on the Expected Maximum of Correlated Normal Variables , 2010 .
[245] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[246] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[247] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[248] Donald Michie,et al. BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .