Proceedings of the Workshop on Vision and Natural Language Processing