Topic-aware video summarization using multimodal transformer