暂无分享,去创建一个
[1] William Yang Wang,et al. Deep Reinforcement Learning for NLP , 2018, ACL.
[2] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[3] Vianney Perchet,et al. Local Differentially Private Regret Minimization in Reinforcement Learning , 2020, ArXiv.
[4] Aaron Roth,et al. Mechanism design in large games: incentives and privacy , 2012, ITCS.
[5] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[6] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[8] Ness B. Shroff,et al. Multi-Armed Bandits with Local Differential Privacy , 2020, ArXiv.
[9] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[10] N. Hegde,et al. Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces , 2019, NeurIPS.
[11] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Xiaoyu Chen,et al. (Locally) Differentially Private Combinatorial Semi-Bandits , 2020, ICML.
[14] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[15] Nishant A. Mehta,et al. Optimal Algorithms for Private Online Learning in a Stochastic Environment , 2021, ArXiv.
[16] Emilie Kaufmann,et al. Corrupt Bandits for Preserving Local Privacy , 2017, ALT.
[17] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[18] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[19] Martin J. Wainwright,et al. Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[20] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[21] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[22] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[23] Akshay Krishnamurthy,et al. Private Reinforcement Learning with PAC and Regret Guarantees , 2020, ICML.
[24] Shie Mannor,et al. Exploration-Exploitation in Constrained MDPs , 2020, ArXiv.
[25] Adam D. Smith,et al. (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.
[26] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[27] Christos Dimitrakakis,et al. Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.
[28] Aditya Gopalan,et al. Online Learning in Kernelized Markov Decision Processes , 2019, AISTATS.
[29] Or Sheffet,et al. An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule , 2019, ICML.
[30] Doina Precup,et al. Differentially Private Policy Evaluation , 2016, ICML.
[31] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[32] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[33] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[34] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[35] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[36] Kai Zheng,et al. Locally Differentially Private (Contextual) Bandits Learning , 2020, NeurIPS.
[37] Nikita Mishra,et al. (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.
[38] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[39] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[40] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[41] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[42] Sayak Ray Chowdhury,et al. Adaptive Control of Differentially Private Linear Quadratic Systems , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).
[43] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[44] Tim Roughgarden,et al. Private matchings and allocations , 2013, SIAM J. Comput..
[45] Jian Tan,et al. Local Differential Privacy for Bayesian Optimization , 2020, AAAI.
[46] Hajime Ono,et al. Locally Private Distributed Reinforcement Learning , 2020, ArXiv.
[47] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[48] Abhimanyu Dubey,et al. No-Regret Algorithms for Private Gaussian Process Bandit Optimization , 2021, AISTATS.
[49] Karan Singh,et al. The Price of Differential Privacy for Online Learning , 2017, ICML.
[50] Peter Kairouz,et al. Discrete Distribution Estimation under Local Privacy , 2016, ICML.
[51] Teng Wang,et al. Locally Differentially Private Data Collection and Analysis , 2019, ArXiv.
[52] Cynthia Breazeal,et al. Affective Personalization of a Social Robot Tutor for Children's Second Language Skills , 2016, AAAI.
[53] Pravesh Kothari,et al. 25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .
[54] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[55] Roshan Shariff,et al. Differentially Private Contextual Linear Bandits , 2018, NeurIPS.
[56] Cynthia Dwork,et al. Differential Privacy: A Survey of Results , 2008, TAMC.
[57] Christos Dimitrakakis,et al. Achieving Privacy in the Adversarial Multi-Armed Bandit , 2017, AAAI.
[58] Elaine Shi,et al. Private and Continual Release of Statistics , 2010, TSEC.
[59] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[60] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[61] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[62] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.