Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval