论文信息 - Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration,bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.

[1] P. Deb. Finite Mixture Models , 2008 .

[2] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.

[3] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.

[4] Dominique Bontemps,et al. Clustering and variable selection for categorical multivariate data , 2010, 1002.1142.

[5] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.

[6] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[7] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.

[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[9] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.

[10] Santosh S. Vempala,et al. A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[11] Emmanuel J. Candès,et al. Robust Subspace Clustering , 2013, ArXiv.

[12] René Vidal,et al. Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[13] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.

[14] Mengdi Wang,et al. Learning to Control in Metric Space with Optimal Regret , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15] P. Müller,et al. 10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[16] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.

[17] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.

[18] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[19] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.

[20] Alfons Juan-Císcar,et al. Bernoulli mixture models for binary images , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[21] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[22] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.

[23] Geoffrey J. McLachlan,et al. Mixture models : inference and applications to clustering , 1989 .

[24] D. B. Dahl. Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[25] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.

[26] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[27] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[28] Xiaofei Wang,et al. Application of Subspace Clustering in DNA Sequence Analysis , 2015, J. Comput. Biol..

[29] Huan Xu,et al. Provable Subspace Clustering: When LRR Meets SSC , 2013, IEEE Transactions on Information Theory.

[30] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.

[32] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.