Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning