Gradient Information Matters in Policy Optimization by Back-propagating through Model