Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?