Policy Gradient in Partially Observable Environments: Approximation and Convergence