SP TELEPHONE SPEECH

In this paper we report on experiments with phone recognition of spontaneous telephone speech. Phone recognizers were trained and assessed on IDEAL., a multilingual corpus containing telephone speech in French, British English, German and Castillan Spanish. We investigated the influence of the training material composition (size and linguistic content) on the recognition performance using context-independent Hidden Markov Models and phonotactic bigram models. We found that when testing on spontaneous speech data, using only spontaneous speech training data gave the highest phone accuracies for tie four languages, even though this data comprises only 14% of the available training data. The use of contextdependent HMMs reduced the phone error across the 4 languages, with the average error reduced to 5 1.9% from the 57.4% obtained with CI models. We suggest a straightforward way of detecting non speech phenomena. The basic idea is to remove sequences of consonants between ~JNO silence labels from the recognized phone strings prior to scoring. This simple technique reduces the relative average phone error rate by 5.4%. The lowest phone error with CD models and Eltering was obtained for Spanish (39.1 %) with 4 language average being 49.1%.

[1]  Yonghong Yan,et al.  Development of an approach to automatic language identification based on phone recognition , 1996, Comput. Speech Lang..

[2]  Jean-Luc Gauvain,et al.  Cross-lingual experiments with phone recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jean-Luc Gauvain,et al.  A phone-based approach to non-linguistic speech feature identification , 1995, Comput. Speech Lang..

[4]  J. Kohler Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[6]  Jean-Luc Gauvain,et al.  Experiments on speaker-independent phone recognition using BREF , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Jean-Luc Gauvain,et al.  Language identification with language-independent acoustic models , 1997, EUROSPEECH.

[8]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[9]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[10]  Tanja Schultz,et al.  LVCSR-based language identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.