Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
暂无分享,去创建一个
[1] Marcel Paul Schützenberger,et al. On the Definition of a Family of Automata , 1961, Inf. Control..
[2] Jack W. Carlyle,et al. Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..
[3] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[4] Leslie Pack Kaelbling,et al. Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.
[5] Milos Hauskrecht,et al. Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.
[6] Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.
[7] Eric Allender,et al. Complexity of finite-horizon Markov decision process problems , 2000, JACM.
[8] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[9] H. Jaeger. Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .
[10] Elchanan Mossel,et al. Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.
[11] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.
[12] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[13] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[14] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..
[15] Brahim Chaib-draa,et al. Quasi-Deterministic Partially Observable Markov Decision Processes , 2009, ICONIP.
[16] Blai Bonet,et al. Deterministic POMDPs Revisited , 2009, UAI.
[17] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[18] Le Song,et al. Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.
[19] Thomas L. Griffiths,et al. Faster Teaching by POMDP Planning , 2011, AIED.
[20] David Barber,et al. On the Computational Complexity of Stochastic Controller Optimization in POMDPs , 2011, TOCT.
[21] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[22] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[23] Emma Brunskill,et al. A PAC RL Algorithm for Episodic POMDPs , 2016, AISTATS.
[24] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.
[25] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[26] Vatsal Sharan,et al. Learning Overcomplete HMMs , 2017, NIPS.
[27] Guy Shani,et al. Iterative Planning for Deterministic QDec-POMDPs , 2018, Global Conference on Artificial Intelligence.
[28] Yisong Yue,et al. Policy Gradient in Partially Observable Environments: Approximation and Convergence , 2018 .
[29] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[30] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.