Self-supervised adversarial learning for cross-modal retrieval