Quantum policy gradient algorithms