Guiding Representation Learning in Deep Generative Models with Policy Gradients