暂无分享,去创建一个
[1] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.
[2] Shie Mannor,et al. Rotting Bandits , 2017, NIPS.
[3] Shie Mannor,et al. Latent Bandits , 2014, ICML.
[4] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[5] Yossi Aviv,et al. A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..
[6] Alessandro Lazaric,et al. Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes , 2018, ArXiv.
[7] Haikady N. Nagaraja,et al. Inference in Hidden Markov Models , 2006, Technometrics.
[8] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[9] Maria L. Gini,et al. Detecting and Forecasting Economic Regimes in Multi-Agent Automated Exchanges , 2007, Decis. Support Syst..
[10] Lai Wei,et al. On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems , 2018, 2018 Annual American Control Conference (ACC).
[11] John N. Tsitsiklis,et al. A Structured Multiarmed Bandit Problem and the Greedy Policy , 2008, IEEE Transactions on Automatic Control.
[12] Ronald Ortner,et al. Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2015, ICML.
[13] Lillian J. Ratliff,et al. Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback , 2018, 1803.04008.
[14] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[15] Tor Lattimore,et al. Bounded Regret for Finite-Armed Structured Bandits , 2014, NIPS.
[16] A. V. den Boer,et al. Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions , 2013 .
[17] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[18] Samarth Gupta,et al. Correlated Multi-Armed Bandits with A Latent Random Source , 2018, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] C.C. White,et al. Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.
[20] Assaf J. Zeevi,et al. Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..
[21] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[22] David Simchi-Levi,et al. Learning to Optimize under Non-Stationarity , 2018, AISTATS.
[23] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[24] Luc Leh'ericy,et al. Consistent order estimation for nonparametric hidden Markov models , 2019, Bernoulli.
[25] Rogemar S. Mamon,et al. Hidden Markov Models In Finance , 2007 .
[26] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.
[27] Karl Hinderer,et al. Lipschitz Continuity of Value Functions in Markovian Decision Processes , 2005, Math. Methods Oper. Res..
[28] Anelia Somekh-Baruch,et al. Restless Hidden Markov Bandit with Linear Rewards , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).
[29] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.
[30] Peng Shi,et al. Approximation algorithms for restless bandit problems , 2007, JACM.
[31] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[32] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[33] Djallel Bouneffouf,et al. A Survey on Practical Applications of Multi-Armed and Contextual Bandits , 2019, ArXiv.
[34] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[37] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[38] Yohann De Castro,et al. Consistent Estimation of the Filtering and Marginal Smoothing Distributions in Nonparametric Hidden Markov Models , 2015, IEEE Transactions on Information Theory.
[39] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..