Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
暂无分享,去创建一个
Ruosong Wang | Wotao Yin | Lin F. Yang | Fei Feng | Simon S. Du | S. Du | Ruosong Wang | W. Yin | Fei Feng
[1] P. Deb. Finite Mixture Models , 2008 .
[2] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[3] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[4] Dominique Bontemps,et al. Clustering and variable selection for categorical multivariate data , 2010, 1002.1142.
[5] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[6] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.
[7] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[9] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[10] Santosh S. Vempala,et al. A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..
[11] Emmanuel J. Candès,et al. Robust Subspace Clustering , 2013, ArXiv.
[12] René Vidal,et al. Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.
[13] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[14] Mengdi Wang,et al. Learning to Control in Metric Space with Optimal Regret , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[15] P. Müller,et al. 10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .
[16] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[17] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[18] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[19] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.
[20] Alfons Juan-Císcar,et al. Bernoulli mixture models for binary images , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[21] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.
[22] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[23] Geoffrey J. McLachlan,et al. Mixture models : inference and applications to clustering , 1989 .
[24] D. B. Dahl. Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .
[25] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[26] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[27] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[28] Xiaofei Wang,et al. Application of Subspace Clustering in DNA Sequence Analysis , 2015, J. Comput. Biol..
[29] Huan Xu,et al. Provable Subspace Clustering: When LRR Meets SSC , 2013, IEEE Transactions on Information Theory.
[30] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[31] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[32] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[33] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[34] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[35] Hans-Peter Kriegel,et al. Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..
[36] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[37] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[38] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[39] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[40] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[41] Hongyuan Zha,et al. Computational Statistics Data Analysis , 2021 .
[42] Hamid R. Rabiee,et al. Reliable clustering of Bernoulli mixture models , 2017 .
[43] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[44] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[45] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[46] Zhao Song,et al. Efficient Model-free Reinforcement Learning in Metric Spaces , 2019, ArXiv.
[47] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[48] Sanjoy Dasgupta,et al. A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.
[49] P. Müller,et al. Bayesian inference for gene expression and proteomics , 2006 .
[50] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[51] Alfons Juan-Císcar,et al. On the use of Bernoulli mixture models for text classification , 2001, Pattern Recognit..
[52] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[53] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[54] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[55] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[56] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[57] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity , 2020, ArXiv.
[58] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[59] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[60] Aravindan Vijayaraghavan,et al. On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).
[61] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[62] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.