2A1-L03 The reward distribution based on peripheral information for multi-agent reinforcement learning