Consistent Multimodal Generation via A Unified GAN Framework