A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations