A Scalable Privacy-Preserving Multi-Agent Deep Reinforcement Learning Approach for Large-Scale Peer-to-Peer Transactive Energy Trading

Peer-to-peer (P2P) transactive energy trading has emerged as a promising paradigm towards maximizing the flexibility value of prosumers’ distributed energy resources (DERs). Despite reinforcement learning constitutes a well-suited model-free and data-driven methodological framework to optimize prosumers’ energy management decisions, its application to the large-scale coordinated management and P2P trading among multiple prosumers within an energy community is still challenging, due to the scalability, non-stationarity and privacy limitations of state-of-the-art multi-agent deep reinforcement learning (MADRL) approaches. This paper proposes a novel P2P transactive trading scheme based on the multi-actor-attention-critic (MAAC) algorithm, which addresses the above challenges individually. This method is complemented by a P2P trading platform that incentivizes prosumers to engage in local energy trading while also penalizes each prosumer’s addition to rebound peaks. Case studies involving a real-world, large-scale scenario with 300 residential prosumers demonstrate that the proposed method significantly outperforms the state-of-the-art MADRL methods in reducing the community’s cost and peak demand.