暂无分享,去创建一个
[1] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[2] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] Steven L. Scott,et al. Multi-armed bandit experiments in the online service economy , 2015 .
[5] S. Walker,et al. Extending Doob's consistency theorem to nonparametric densities , 2004 .
[6] Michael I. Jordan,et al. Hierarchical Bayesian Nonparametric Models with Applications , 2008 .
[7] Y. Teh,et al. Multi-Armed Bandit for Species Discovery: A Bayesian Nonparametric Approach , 2018 .
[8] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.
[9] D. Dunson,et al. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. , 2010, Biometrika.
[10] M. Escobar,et al. Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .
[11] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .
[12] A. V. D. Vaart,et al. Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .
[13] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[14] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[15] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[16] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[17] David B. Dunson,et al. Posterior consistency in conditional distribution estimation , 2013, J. Multivar. Anal..
[18] A. V. D. Vaart,et al. Convergence rates of posterior distributions , 2000 .
[19] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[20] Eyke Hüllermeier,et al. On the bayes-optimality of F-measure maximizers , 2013, J. Mach. Learn. Res..
[21] Michael I. Jordan,et al. Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications , 2010 .
[22] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[23] Samuel J. Gershman,et al. A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.
[24] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[25] Sham Kakade,et al. An Optimal Algorithm for Linear Bandits , 2011, ArXiv.
[26] Iñigo Urteaga,et al. Variational inference for the multi-armed contextual bandit , 2017, AISTATS.
[27] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[28] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[29] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[30] Iñigo Urteaga,et al. (Sequential) Importance Sampling Bandits , 2018, ArXiv.
[31] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[32] W. R. Thompson. On the Theory of Apportionment , 1935 .
[33] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[34] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.
[35] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[36] J. Ghosh,et al. POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .
[37] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[38] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[39] Wei Chu,et al. A case study of behavior-driven conjoint analysis on Yahoo!: front page today module , 2009, KDD.
[40] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[41] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.
[42] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[43] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[44] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[45] Abhijit Gosavi,et al. Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..
[46] Lawrence Carin,et al. Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.
[47] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[48] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[49] John Shawe-Taylor,et al. Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.
[50] A. V. D. Vaart,et al. Posterior convergence rates of Dirichlet mixtures at smooth densities , 2007, 0708.1885.
[51] Barbara E. Engelhardt,et al. PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits , 2018, NeurIPS.
[52] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[53] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[54] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[55] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.