Multimodality grounded translation by humans and machines