Some Recent Applications of Reinforcement Learning

Five relatively recent applications of reinforcement learning methods are described. These examples were chosen to illustrate a diversity of application types, the engineering needed to build applications, and most importantly, the impressive results that these methods are able to achieve. This paper is based on a case-study chapter of the forthcoming second edition of Sutton and Barto’s 1998 book “Reinforcement Learning: An Introduction” [7].

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Gerald Tesauro,et al.  Simulation, learning, and optimization techniques in Watson's game strategies , 2012, IBM J. Res. Dev..

[3]  Engin Ipek,et al.  Dynamic Multicore Resource Management: A Machine Learning Approach , 2009, IEEE Micro.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[6]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[8]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9]  Philip S. Thomas,et al.  Safe Reinforcement Learning , 2015 .

[10]  Philip S. Thomas,et al.  Ad Recommendation Systems for Life-Time Value Optimization , 2015, WWW.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Gerald Tesauro,et al.  Analysis of Watson's Strategies for Playing Jeopardy! , 2013, J. Artif. Intell. Res..

[13]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[14]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[15]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.