Distributed Offline Policy Optimization Over Batch Data

Federated learning (FL) has received increasing interests during the past years, However, most of the existing works focus on supervised learning, and federated learning for sequential decision making has not been fully explored. Part of the reason is that learning a policy for sequential decision making typically requires re-peated interaction with the environments, which is costly in many FL applications. To overcome this issue, this work proposes a federated offline policy optimization method abbreviated as FedOPO that allows clients to jointly learn the optimal policy without interacting with environments during training. Albeit the nonconcave-convex-strongly concave nature of the resultant max-min-max problem, we establish both the local and global convergence of our FedOPO algorithm. Experiments on the OpenAI gym demonstrate that our algorithm is able to find a near-optimal policy while enjoying various merits brought by FL, including training speedup and improved asymptotic performance.

[1]  Shusen Wang,et al.  Federated Reinforcement Learning with Environment Heterogeneity , 2022, AISTATS.

[2]  Dae-Hyun Choi,et al.  Federated Reinforcement Learning for Energy Management of Multiple Smart Homes With Distributed Energy Resources , 2022, IEEE Transactions on Industrial Informatics.

[3]  Sanjeev Arora,et al.  Evaluating Gradient Inversion Attacks and Defenses in Federated Learning , 2021, NeurIPS.

[4]  Pin-Yu Chen,et al.  CAFE: Catastrophic Data Leakage in Vertical Federated Learning , 2021, ArXiv.

[5]  Wotao Yin,et al.  Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems , 2021, ArXiv.

[6]  Nan Jiang,et al.  On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction , 2021, AISTATS.

[7]  Weishan Zhang,et al.  Blockchain-Based Federated Learning for Device Failure Detection in Industrial IoT , 2021, IEEE Internet of Things Journal.

[8]  Sergey Levine,et al.  COMBO: Conservative Offline Model-Based Policy Optimization , 2021, NeurIPS.

[9]  Brendan O'Donoghue,et al.  Sample Efficient Reinforcement Learning with REINFORCE , 2020, AAAI.

[10]  Emma Brunskill,et al.  Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.

[11]  Lihong Li,et al.  Off-Policy Evaluation via the Regularized Lagrangian , 2020, NeurIPS.

[12]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[13]  Yutaka Matsuo,et al.  Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization , 2020, ICLR.

[14]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[15]  Csaba Szepesvari,et al.  On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.

[16]  Christopher Briggs,et al.  Federated learning with hierarchical clustering of local updates to improve training on non-IID data , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[17]  Eunho Yang,et al.  Federated Continual Learning with Weighted Inter-client Transfer , 2020, ICML.

[18]  Han Yu,et al.  FedCoin: A Peer-to-Peer Payment System for Federated Learning , 2020, Federated Learning.

[19]  Bo Dai,et al.  GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.

[20]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[21]  Ilya Kostrikov,et al.  AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.

[22]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[23]  Yifan Wu,et al.  Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[24]  S. Kakade,et al.  Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.

[25]  Xin Qin,et al.  FedHealth: A Federated Transfer Learning Framework for Wearable Healthcare , 2019, IEEE Intelligent Systems.

[26]  Hao Zhu,et al.  Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..

[27]  Jalaj Bhandari,et al.  Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.

[28]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[29]  Bo Dai,et al.  DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.

[30]  S. H. Song,et al.  Client-Edge-Cloud Hierarchical Federated Learning , 2019, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[31]  Yao Liu,et al.  Off-Policy Policy Gradient with Stationary Distribution Correction , 2019, UAI.

[32]  Marc G. Bellemare,et al.  Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.

[33]  Ming Liu,et al.  Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems , 2019, IEEE Robotics and Automation Letters.

[34]  Martha White,et al.  An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.

[35]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[36]  Qiang Liu,et al.  Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.

[37]  Martin Jaggi,et al.  Sparsified SGD with Memory , 2018, NeurIPS.

[38]  Jianyu Wang,et al.  Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.

[39]  Shenghuo Zhu,et al.  Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.

[40]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[41]  Georgios B. Giannakis,et al.  LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.

[42]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[43]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[44]  Lihong Li,et al.  Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.

[45]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[46]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[47]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[48]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[49]  Martha White,et al.  Linear Off-Policy Actor-Critic , 2012, ICML.

[50]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[51]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[52]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[53]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[54]  Gangqiang Li,et al.  Privacy Protection in Prosumer Energy Management Based on Federated Learning , 2021, IEEE Access.

[55]  Byung-Jun Lee,et al.  Representation Balancing Offline Model-based Reinforcement Learning , 2021, ICLR.

[56]  Sung Ju Hwang,et al.  Federated Semi-Supervised Learning with Inter-Client Consistency , 2020, ICLR.

[57]  K. Doya Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.