Spontaneous Speech Summarization: Transformers All The Way Through
暂无分享,去创建一个
This paper proposes a speech summarization system for spontaneous speech. The proposed system consists of speech segmentation, speech recognition, and extractive text summarization modules. We utilize the Transformer architecture for all modules, enabling us to achieve outstanding performance by capturing global and local context information from the sequence thanks to the self-attention mechanism. Furthermore, we introduce a novel data augmentation method for speech summarization using the results of speech segmentation and recognition modules. The proposed data augmentation addresses each sentence boundary's ambiguity in spontaneous speech, making it possible to improve the robustness for speech segmentation and recognition errors. We conduct an experimental evaluation using the Corpus of Spontaneous Japanese, which consists of Japanese speech such as lecture and conference talks. Through the experimental evaluation, we investigate individual performance and each module's relationship in terms of text summarization performance and demonstrate the effectiveness of the proposed data augmentation method.