Evaluating Generative Models for Text Generation

Generating human quality text is a challenging problem because of ambiguity of meaning and difficulty in modeling long term semantic connections. Recurrent Neural Networks (RNNs) have shown promising results in this problem domain, with the most common approach to its training being to maximize the log predictive likelihood of each true token in the training sequence given the previously observed tokens. Scheduled Sampling, proposed by Bengio et al. (2015), was proposed as an improvement to the maximum likelihood approach by stochastically introducing inference steps during training steps. More recently, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the generative model have become popular in vision domain, and also reinterpreted as a reinforcement learning problem to adapt it to text generation by Yu et al. (2016). Here we test and compare these three approaches and thus hope to extend the evaluation presented for the SeqGAN model in Yu et al. (2016) using two additional datasets and an additional perplexity evaluation metric.