暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[3] Doina Precup,et al. Differentially Private Policy Evaluation , 2016, ICML.
[4] Vianney Perchet,et al. Local Differentially Private Regret Minimization in Reinforcement Learning , 2020, ArXiv.
[5] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.
[6] Prasad Tadepalli,et al. Model-Based Reinforcement Learning , 2010, Encyclopedia of Machine Learning and Data Mining.
[7] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[8] J. Bretagnolle,et al. Estimation des densités: risque minimax , 1978 .
[9] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[10] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[11] Xiaoyu Chen,et al. (Locally) Differentially Private Combinatorial Semi-Bandits , 2020, ICML.
[12] Martin J. Wainwright,et al. Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.
[13] Sofya Raskhodnikova,et al. What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.
[14] Roshan Shariff,et al. Differentially Private Contextual Linear Bandits , 2018, NeurIPS.
[15] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[16] Kai Zheng,et al. Locally Differentially Private (Contextual) Bandits Learning , 2020, NeurIPS.
[17] Christos Dimitrakakis,et al. Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.
[18] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[19] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[20] Martin J. Wainwright,et al. Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates , 2013, 1302.3203.
[21] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[22] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[23] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[24] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).
[25] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[26] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[27] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[28] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[29] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[30] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[31] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[32] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[33] Emilie Kaufmann,et al. Corrupt Bandits for Preserving Local Privacy , 2017, ALT.
[34] L. Schmetterer. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. , 1963 .
[35] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[36] Akshay Krishnamurthy,et al. Private Reinforcement Learning with PAC and Regret Guarantees , 2020, ICML.
[37] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.