Modality-Invariant Image-Text Embedding for Image-Sentence Matching

Performing direct matching among different modalities (like image and text) can benefit many tasks in computer vision, multimedia, information retrieval, and information fusion. Most of existing wo...