暂无分享,去创建一个
[1] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[2] Shie Mannor,et al. Contextual Markov Decision Processes , 2015, ArXiv.
[3] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.
[4] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[5] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[6] Le Song,et al. Nonparametric Estimation of Multi-View Latent Variable Models , 2013, ICML.
[7] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[9] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.
[10] Aditya Gopalan,et al. Low-rank Bandits with Latent Mixtures , 2016, ArXiv.
[11] J. Tropp. FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.
[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[13] John Langford,et al. Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations , 2016, ArXiv.
[14] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..