Policy Gradients for Contextual Recommendations
暂无分享,去创建一个
Fuzhen Zhuang | Qing He | Pingzhong Tang | Feiyang Pan | Qingpeng Cai | Fuzhen Zhuang | Qing He | Pingzhong Tang | Feiyang Pan | Qingpeng Cai
[1] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[2] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[3] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[4] Yiwei Zhang,et al. Reinforcement Mechanism Design for e-commerce , 2017, WWW.
[5] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[6] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[7] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[8] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[9] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.
[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[11] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Robert Babuska,et al. Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[17] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[18] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[21] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[24] Liang Tang,et al. Ensemble contextual bandits for personalized recommendation , 2014, RecSys '14.
[25] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[26] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[27] Ahmad A. Kardan,et al. A hybrid web recommender system based on Q-learning , 2008, SAC '08.
[28] Guy Shani,et al. An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..
[29] Liang Tang,et al. Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.
[30] Filip Radlinski,et al. Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..
[31] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[32] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[33] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[34] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[35] Alda Lopes Gançarski,et al. A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System , 2012, ICONIP.