ADS Optimization Using Reinforcement Learning
暂无分享,去创建一个
Preeti Nagrath | Rachna Jain | Anuj Thareja | Sai Tiger Raina | Paras Prakash | P. Nagrath | Rachna Jain | P. Prakash | Anuj Thareja
[1] Michael L. Littman,et al. Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.
[2] YevgeniyVorobeychik,et al. Adversarial Machine Learning , 2018 .
[3] Raphaël Féraud,et al. Exploration and exploitation of scratch games , 2013, Machine Learning.
[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[5] M. de Rijke,et al. Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.
[6] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[7] R. Shepard. Edward feigenbaum and julian feldman (Editors). Computers and thought. New York: McGraw‐Hill, 1963 , 1964 .
[8] Y. Mei. Sequential change-point detection when unknown parameters are present in the pre-change distribution , 2006, math/0605322.
[9] T. Rydén. On recursive estimation for hidden Markov models , 1997 .
[10] Andrew G. Barto,et al. Learning and incremental dynamic programming , 1991, Behavioral and Brain Sciences.
[11] David P. M. Scollnik,et al. An index sampling algorithm for the Bayesian analysis of a class of model selection problems , 1994 .
[12] Christian C Luhmann. Discounting of delayed rewards is not hyperbolic. , 2013, Journal of experimental psychology. Learning, memory, and cognition.
[13] Murat Kantarcioglu,et al. Adversarial Machine Learning , 2018, Adversarial Machine Learning.
[14] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[15] J. Ramon,et al. Hoeffding’s Inequality for Sums of Dependent Random Variables , 2017 .
[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[17] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[18] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[19] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[20] Image‐Based Empirical Importance Sampling: An Efficient Way of Estimating Intensities , 2011 .
[21] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[22] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[23] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[24] Yi Li,et al. A hybrid recommendation algorithm adapted in e-learning environments , 2012, World Wide Web.
[25] Krishna Bharat,et al. Improved Algorithms for Topic Distillation in a Hyperlinked Environment , 1998, SIGIR Forum.