论文信息 - ADS Optimization Using Reinforcement Learning - 字舞流文

ADS Optimization Using Reinforcement Learning

Preeti Nagrath | Rachna Jain | Anuj Thareja | Sai Tiger Raina | Paras Prakash | P. Nagrath | Rachna Jain | P. Prakash | Anuj Thareja

[1] Michael L. Littman,et al. Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[2] YevgeniyVorobeychik,et al. Adversarial Machine Learning , 2018 .

[3] Raphaël Féraud,et al. Exploration and exploitation of scratch games , 2013, Machine Learning.

[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5] M. de Rijke,et al. Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[6] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[7] R. Shepard. Edward feigenbaum and julian feldman (Editors). Computers and thought. New York: McGraw‐Hill, 1963 , 1964 .

[8] Y. Mei. Sequential change-point detection when unknown parameters are present in the pre-change distribution , 2006, math/0605322.

[9] T. Rydén. On recursive estimation for hidden Markov models , 1997 .

[10] Andrew G. Barto,et al. Learning and incremental dynamic programming , 1991, Behavioral and Brain Sciences.

[11] David P. M. Scollnik,et al. An index sampling algorithm for the Bayesian analysis of a class of model selection problems , 1994 .

[12] Christian C Luhmann. Discounting of delayed rewards is not hyperbolic. , 2013, Journal of experimental psychology. Learning, memory, and cognition.

[13] Murat Kantarcioglu,et al. Adversarial Machine Learning , 2018, Adversarial Machine Learning.

[14] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[15] J. Ramon,et al. Hoeffding’s Inequality for Sums of Dependent Random Variables , 2017 .

[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[17] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..

[18] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[19] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.

[20] Image‐Based Empirical Importance Sampling: An Efficient Way of Estimating Intensities , 2011 .

[21] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[22] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[23] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[24] Yi Li,et al. A hybrid recommendation algorithm adapted in e-learning environments , 2012, World Wide Web.

[25] Krishna Bharat,et al. Improved Algorithms for Topic Distillation in a Hyperlinked Environment , 1998, SIGIR Forum.