Contrastive Graph Multimodal Model for Text Classification in Videos