Optimal recommendation to users that react: Online learning for a class of POMDPs
暂无分享,去创建一个
[1] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[2] Qing Zhao,et al. Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.
[3] Robin Burke,et al. Context-aware music recommendation based on latenttopic sequential patterns , 2012, RecSys.
[4] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.
[5] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[6] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[7] Tara Javidi,et al. Optimality of myopic policy for a class of monotone affine restless multi-armed bandits , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[8] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[9] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[10] D. Manjunath,et al. A restless bandit with no observable states for recommendation systems and communication link scheduling , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Michael J. Neely,et al. Network utility maximization over partially observable Markovian channels , 2013, Perform. Evaluation.
[13] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[14] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[15] W. Rudin. Principles of mathematical analysis , 1964 .
[16] Michael J. Neely,et al. Network utility maximization over partially observable Markovian channels , 2010, 2011 International Symposium of Modeling and Optimization of Mobile, Ad Hoc, and Wireless Networks.
[17] Vivek S. Borkar,et al. Whittle index policy for crawling ephemeral content , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[18] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[19] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[20] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[21] D. Manjunath,et al. On the Whittle Index for Restless Multiarmed Hidden Markov Bandits , 2016, IEEE Transactions on Automatic Control.