Material for Learning Robust Visual-Semantic Embeddings