Spoken language understanding in an intelligent tutoring scenario
暂无分享,去创建一个
This study computationalizes two linguistic concepts, contrast and focus, and applies them to the robust understanding of spontaneous speech in an intelligent tutoring system (ITS). We propose focus kernel to represent those words which contain novel and important information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We also define contrast as a set of words which are parallel in linguistic or information structure but different or contrastive in meaning. In particular, we define three types of contrast: symmetric contrast, contrastive focus, and contrastive topic . We further demonstrate the effectiveness of detecting contrast and focus kernel by evaluating the performance on the ITS corpus. The detection of contrast and focus kernel is based on word similarity and dissimilarity analysis, part-of-speech tagging, and pitch accent. The classification achieved accuracies of 83.8% for focus kernel and 85.2% for contrast. Moreover, we demonstrated the efficiency of focus kernel in content summarization of spoken messages: using accurate transcriptions of speech input, 75.5% tutoring events were correctly classified.
In addition, we argue for the importance of detecting cognitive activities for robust speech understanding in the ITS dialogue scenario. In particular, we address and evaluate a cognitive state classification system, which classifies the cognitive activities of children users into three categories: confidence, puzzlement, and hesitation. The classification yielded accuracies of 96.6% for transcribed speech and 95.7% for recognized speech. The study results showed that the cognitive state classification was very robust to speech recognition errors.
Moreover, this study tries to acknowledge the role of speech in SLU. The existing semantic analysis of speech is usually achieved through text transcription. However, human interpretation of speech and text is via different channels. We admit the role of text in semantic analysis, but we also investigate how to use the speech characteristics to semantically analyze the content meaning of speech signals.