Text-to-image synthesis with self-supervised learning