Detection of Fillers Using Prosodic Features in Spontaneous Speech Recognition of Japanese

A new scheme of detecting fillers in spontaneous speech recognition process was developed. When a filler hypothesis appears during the 2 nd pass decoding of a speech recognizer with two-pass configuration, a prosodic module checks the morpheme which is hypothesized as a filler and outputs the likelihood score of the morpheme being a filler. When the likelihood score exceeds a threshold, a prosodic score is added to the language score of the hypothesis as a bonus. The prosodic module is constructed using five-layered perceptron. With inputs on prosodic features of current, preceding and following morphemes, the perceptron calculates the filler likelihood. A comparative recognition experiment with and without the prosodic module was conducted for 100 utterances of spontaneous speech, which are included in the corpus of academic meeting presentations of the Corpus of Spontaneous Japanese. Seven fillers originally miss-recognized as nonfillers are correctly recognized as fillers when the prosodic module is used. No fillers originally recognized as fillers are wrongly recognized as non-fillers. Although a few non-filler morphemes are miss-recognized as other non-filler morphemes by the introduction of the prosodic module, they can be corrected by properly setting parameters of the 2 nd pass search process. These results indicate the proposed scheme can improve the performance of spontaneous speech recognition.