Reinforcement learning with augmented states in partially expectation and action observable environment