Privacy-Preserving Policy Iteration for Decentralized POMDPs

We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP benchmark problems confirm the effectiveness of our approach.

[1]  Jonathan P. How,et al.  Stick-Breaking Policy Learning in Dec-POMDPs , 2015, IJCAI.

[2]  Olivier Buffet,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[3]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[4]  Boi Faltings,et al.  Protecting Privacy through Distributed Computation in Multi-agent Decision Making , 2013, J. Artif. Intell. Res..

[5]  Jonathan P. How,et al.  Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Fillia Makedon,et al.  Privacy preserving learning in negotiation , 2005, SAC '05.

[7]  Marc Toussaint,et al.  Probabilistic Inference Techniques for Scalable Multiagent Decision Making , 2015, J. Artif. Intell. Res..

[8]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[9]  Nikos A. Vlassis,et al.  The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.

[10]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[11]  Tamir Tassa,et al.  A privacy-preserving algorithm for distributed constraint optimization , 2014, AAMAS.

[12]  Josh Benaloh,et al.  Dense Probabilistic Encryption , 1999 .

[13]  Christopher Amato,et al.  COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs , 2017, IJCAI.

[14]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[15]  Feng Wu,et al.  Trial-Based Dynamic Programming for Multi-Agent Planning , 2010, AAAI.

[16]  Tom Rodden,et al.  A Disaster Response System based on Human-Agent Collectives , 2015, J. Artif. Intell. Res..

[17]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[18]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[19]  Tamir Tassa,et al.  Max-Sum Goes Private , 2015, IJCAI.

[20]  Sarvapali D. Ramchurn,et al.  Agile Planning for Real-World Disaster Response , 2015, IJCAI.

[21]  Feng Wu,et al.  Multi-Agent Planning with Baseline Regret Minimization , 2017, IJCAI.

[22]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[23]  Ronen I. Brafman,et al.  A Privacy Preserving Algorithm for Multi-Agent Planning and Search , 2015, IJCAI.

[24]  Feng Wu,et al.  Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[25]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[26]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[27]  Feng Wu,et al.  Monte-Carlo Expectation Maximization for Decentralized POMDPs , 2013, IJCAI.