ADS Optimization Using Reinforcement Learning

[1]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[2]  YevgeniyVorobeychik,et al.  Adversarial Machine Learning , 2018 .

[3]  Raphaël Féraud,et al.  Exploration and exploitation of scratch games , 2013, Machine Learning.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[6]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[7]  R. Shepard Edward feigenbaum and julian feldman (Editors). Computers and thought. New York: McGraw‐Hill, 1963 , 1964 .

[8]  Y. Mei Sequential change-point detection when unknown parameters are present in the pre-change distribution , 2006, math/0605322.

[9]  T. Rydén On recursive estimation for hidden Markov models , 1997 .

[10]  Andrew G. Barto,et al.  Learning and incremental dynamic programming , 1991, Behavioral and Brain Sciences.

[11]  David P. M. Scollnik,et al.  An index sampling algorithm for the Bayesian analysis of a class of model selection problems , 1994 .

[12]  Christian C Luhmann Discounting of delayed rewards is not hyperbolic. , 2013, Journal of experimental psychology. Learning, memory, and cognition.

[13]  Murat Kantarcioglu,et al.  Adversarial Machine Learning , 2018, Adversarial Machine Learning.

[14]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[15]  J. Ramon,et al.  Hoeffding’s Inequality for Sums of Dependent Random Variables , 2017 .

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Zheng Wen,et al.  Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..

[18]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[19]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[20]  Image‐Based Empirical Importance Sampling: An Efficient Way of Estimating Intensities , 2011 .

[21]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[22]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[23]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[24]  Yi Li,et al.  A hybrid recommendation algorithm adapted in e-learning environments , 2012, World Wide Web.

[25]  Krishna Bharat,et al.  Improved Algorithms for Topic Distillation in a Hyperlinked Environment , 1998, SIGIR Forum.