论文信息 - A comparison of phone and grapheme-based spoken term detection

A comparison of phone and grapheme-based spoken term detection

We propose grapheme-based sub-word units for spoken term detection (STD). Compared to phones, graphemes have a number of potential advantages. For out-of-vocabulary search terms, phone- based approaches must generate a pronunciation using letter-to-sound rules. Using graphemes obviates this potentially error-prone hard decision, shifting pronunciation modelling into the statistical models describing the observation space. In addition, long-span grapheme language models can be trained directly from large text corpora. We present experiments on Spanish and English data, comparing phone and grapheme-based STD. For Spanish, where phone and grapheme-based systems give similar transcription word error rates (WERs), grapheme-based STD significantly outperforms a phone- based approach. The converse is found for English, where the phone- based system outperforms a grapheme approach. However, we present additional analysis which suggests that phone-based STD performance levels may be achieved by a grapheme-based approach despite lower transcription accuracy, and that the two approaches may usefully be combined. We propose a number of directions for future development of these ideas, and suggest that if grapheme-based STD can match phone-based performance, the inherent flexibility in dealing with out-of-vocabulary terms makes this a desirable approach.

[1] Dong Wang,et al. A comparison of grapheme and phoneme-based units for Spanish spoken term detection , 2008, Speech Commun..

[2] Simon King,et al. Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..

[3] Lukás Burget,et al. Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[4] Antonio Quilis. El comentario fonológico y fonético de textos : teoría y práctica , 1985 .

[5] José B. Mariño,et al. Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[6] Tanja Schultz,et al. Grapheme based speech recognition , 2003, INTERSPEECH.

[7] W. Russell,et al. Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.