Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration