Representations for Multimodal Generation: A Workshop Report