Graphical Models Meet Bandits: A Variational Thompson Sampling Approach
暂无分享,去创建一个
Ole J. Mengshoel | Branislav Kveton | Zheng Wen | Tong Yu | Ruiyi Zhang | B. Kveton | Zheng Wen | Ruiyi Zhang | O. Mengshoel | Tong Yu
[1] Ole J. Mengshoel,et al. Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models , 2017, AAAI.
[2] M. de Rijke,et al. Click Models for Web Search , 2015, Click Models for Web Search.
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] Shuai Li,et al. Contextual Combinatorial Cascading Bandits , 2016, ICML.
[5] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[6] Audrey Durand,et al. Old Dog Learns New Tricks: Randomized UCB for Bandit Problems , 2020, AISTATS.
[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[8] Branislav Kveton,et al. Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.
[9] Robert D. Nowak,et al. Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications , 2017, AISTATS.
[10] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[11] Iñigo Urteaga,et al. Variational inference for the multi-armed contextual bandit , 2017, AISTATS.
[12] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[13] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.
[14] Craig Boutilier,et al. Randomized Exploration in Generalized Linear Bandits , 2019, AISTATS.
[15] Zheng Wen,et al. Stochastic Rank-1 Bandits , 2016, AISTATS.
[16] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[17] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[18] Olivier Cappé,et al. Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.
[19] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[20] Zheng Wen,et al. Efficient online recommendation via low-rank ensemble sampling , 2018, RecSys.
[21] Shuai Li,et al. TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.
[22] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[23] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .
[24] Ross D. Shachter,et al. Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..
[25] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[26] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[27] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[28] Tor Lattimore,et al. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.
[29] Alexandre Proutière,et al. Minimal Exploration in Structured Stochastic Bandits , 2017, NIPS.
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Craig Boutilier,et al. Perturbed-History Exploration in Stochastic Multi-Armed Bandits , 2019, IJCAI.
[32] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[33] Changyou Chen,et al. Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory , 2018, AISTATS.
[34] Craig Boutilier,et al. Perturbed-History Exploration in Stochastic Linear Bandits , 2019, UAI.
[35] Yasin Abbasi-Yadkori,et al. Thompson Sampling and Approximate Inference , 2019, NeurIPS.
[36] Ronald A. Howard,et al. Influence Diagrams , 2005, Decis. Anal..
[37] Julian Zimmert,et al. Factored Bandits , 2018, NeurIPS.
[38] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[39] Robert D. Nowak,et al. Bilinear Bandits with Low-rank Structure , 2019, ICML.
[40] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[41] Lawrence Carin,et al. Scalable Thompson Sampling via Optimal Transport , 2019, AISTATS.