论文信息 - Normalizing Flow Model for Policy Representation in Continuous Action Multi-agent Systems

Normalizing Flow Model for Policy Representation in Continuous Action Multi-agent Systems

Neural networks that output the parameters of a diagonal Gaussian distribution are widely used in reinforcement learning tasks with continuous action spaces. They have had considerable success in single-agent domains and even in some multi-agent tasks. However, general multi-agent tasks often require mixed strategies whose distributions cannot be well approximated by Gaussians or their mixtures. This paper proposes an alternative for policy representation based on normalizing flows. This approach allows for greater flexibility in action distribution representation beyond mixture models. We demonstrate their advantage over standard methods on a set of imitation learning tasks modeling human driving behaviors in the presence of other drivers. ACM Reference Format: Xiaobai Ma, Jayesh K. Gupta, and Mykel J. Kochenderfer. 2020. Normalizing Flow Model for Policy Representation in Continuous Action Multi-agent Systems. In Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), Auckland, New Zealand, May 9–13, 2020, IFAAMAS, 3 pages.

Mykel J. Kochenderfer | Jayesh K. Gupta | Xiaobai Ma

[1] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[2] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[3] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[5] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[6] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[7] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[8] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[9] Marco Pavone,et al. Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[11] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.