Generative Model-Based Text-to-Speech Synthesis