Privacy-preserving reinforcement learning

We consider the problem of distributed reinforcement learning (DRL) from private perceptions. In our setting, agents' perceptions, such as states, rewards, and actions, are not only distributed but also should be kept private. Conventional DRL algorithms can handle multiple agents, but do not necessarily guarantee privacy preservation and may not guarantee optimality. In this work, we design cryptographic solutions that achieve optimal policies without requiring the agents to share their private information.

[1]  A. Yao How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[4]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[5]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[6]  Ivan Damgård,et al.  A Generalisation, a Simplification and Some Applications of Paillier's Probabilistic Public-Key System , 2001, Public Key Cryptography.

[7]  Benjamin Van Roy,et al.  Distributed Optimization in Adaptive Networks , 2003, NIPS.

[8]  Naoki Abe,et al.  Cross channel optimized marketing by reinforcement learning , 2004, KDD.

[9]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[10]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System (Awarded Best Student Paper!) , 2004 .

[11]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Fillia Makedon,et al.  Privacy preserving learning in negotiation , 2005, SAC '05.

[14]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[15]  Benjamin Van Roy,et al.  An approximate dynamic programming approach to decentralized control of stochastic systems , 2006 .

[16]  Balázs Kégl,et al.  Privacy-preserving boosting , 2007, Data Mining and Knowledge Discovery.

[17]  Michael Kearns,et al.  Privacy-Preserving Belief Propagation and Sampling , 2007, NIPS.

[18]  Shigenobu Kobayashi,et al.  Large-Scale k-Means Clustering with User-Centric Privacy Preservation , 2008, PAKDD.

[19]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.