Multimodal Depression Detection Using Task-oriented Transformer-based Embedding