Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language