Today's state-of-the-art front-ends for multilingual speech-to-speech translation systems apply monolingual speech recognizers trained for a single language and/or accent. The monolingual speech engine is usually adaptable to an unknown speaker over time using unsupervised training methods; however, if the speaker was seen during training, their specialized acoustic model will be applied, since it achieves better performance. In order to make full use of specialized acoustic models in this proposed scenario, it is necessary to automatically identify the speaker with high accuracy. Furthermore, monolingual speech recognizers currently rely on the fact that language and/or accent will be selected beforehand by the user. This requires the user's cooperation and an interface which easily allows for such selection. Both requirements are awkward and error-prone, especially when translation services are provided for many languages using small devices like PDAs or telephones. For these scenarios, front-ends are desired which automatically identify the spoken language or accent. We believe that the automatic identification of an utterance's non-verbal cues, such as language, accent and speaker, are necessary to the successful deployment of speech-to-speech translation systems.
[1]
Laura Mayfield Tomokiyo,et al.
Recognizing Non-Native Speech: Characterizing and Adapting to Non-Native Usage in LVCSR
,
2001
.
[2]
Douglas A. Reynolds,et al.
Robust text-independent speaker identification using Gaussian mixture speaker models
,
1995,
IEEE Trans. Speech Audio Process..
[3]
Michael Kohler,et al.
Phonetic Refraction for Speaker Recognition
,
2001
.
[4]
Tanja Schultz,et al.
Speaker identification using multilingual phone strings
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5]
Tanja Schultz,et al.
Language-independent and language-adaptive acoustic modeling for speech recognition
,
2001,
Speech Commun..
[6]
Marc A. Zissnian.
LANGUAGE IDENTIFICATION USING PHONEME RECOGNITION AND PHONOTACTIC LANGUAGE MODELING
,
1995
.
[7]
Alex Waibel,et al.
Speaker, accent, and language identification using multilingual phone strings
,
2002,
Proceedings of the second international conference on Human Language Technology Research -.