Reparameterized Policy Learning for Multimodal Trajectory Optimization