论文信息 - Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

This chapter introduces Markov Decision Processes and the classical methods of dynamic programming, before building familiarity with the ideas of reinforcement learning and other approximate methods for solving MDPs. After describing Bellman optimality and iterative value and policy updates before moving to Q-learning, the chapter quickly advances towards a more engineering style exposition of the topic, covering key computational concepts such as greediness, batch learning, and Q-learning. Through a number of mini-case studies, the chapter provides insight into how RL is applied to optimization problems in asset management and trading.

Matthew F. Dixon | Igor Halperin | Paul Bilokon

[1] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[3] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[4] R. Bellman. Dynamic programming. , 1957, Science.

[5] W. R. Thompson. On a Criterion for the Rejection of Observations and the Distribution of the Ratio of Deviation to Sample Standard Deviation , 1935 .

[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[8] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[9] H. Robbins. A Stochastic Approximation Method , 1951 .

[10] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.