Careful consideration of grid developments illustrates the fundamental changes in its structure which its developments have taken place gradually for a long time. One of the most important developments is the expansion of the communication infrastructure that brings many advantages in the cyber layer of the system. The actual execution of the peer-to-peer (P2P) energy trading is one core advantage which also may lead to the systematic risks such as cyber-attacks. Consequently, it is necessary to form a useful way to cover such challenges. This paper focuses on the online detection of false data injection attack (FDIA), which tries to disrupt the trend of optimal peer-to-peer energy trading in the stochastic condition. Moreover, this article proposes an effective modified Intelligent Priority Selection based Reinforcement Learning (IPS-RL) method to detect and stop the malicious attacks in the shortest time for effective energy trading based on the peer to peer structure. The presented method is compared with other methods such as support vector machine (SVM), reinforcement learning (RL), particle swarm optimization (PSO)-RL, and genetic algorithm (GA)-RL to validate the functionality of the method. The proposed method is implemented and examined on three interconnected microgrids in the form of peer-to-peer structure wherein each microgrid has various agents such as photovoltaic (PV), wind turbine, fuel cell, tidal system, storage unit, etc. Eventually, the unscented transformation (UT) is applied for uncertainty analysis and making the near-reality simulations.