Toward a detector-based universal phone recognizer

In recent research, we have proposed a high-accuracy bottom-up detection-based paradigm for continuous phone speech recognition. The key component of our system was a bank of articulatory detectors each of which computes a score describing an activation level of the specified speech phonetic features that the current frame exhibits. In this work, we present our first attempt at designing a universal phone recognizer using the detection-based approach. We show that our technique is intrinsically language independent since reliable articulatory detectors can be designed for diverse languages, and robust detection can be performed across languages. Moreover, a universal set of detectors is designed by sharing the training material available for several diverse languages. We further demonstrate that our approach makes it possible to decode new target languages by neither retraining nor applying acoustic adaptation techniques. We report phone recognition performance that compares favorably with the best results known by the authors on the OGI Multi-language Telephone Speech corpus.

[1]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[2]  Patrick Schone,et al.  Language-reconfigurable universal phone recognition , 2003, INTERSPEECH.

[3]  Chin-Hui Lee,et al.  Towards bottom-up continuous phone recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[6]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[7]  A. Constantinescu,et al.  On cross-language experiments and data-driven units for ALISP (Automatic Language Independent Speech Processing) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  J. Kohler Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Tanja Schultz,et al.  Experiments on cross-language acoustic modeling , 2001, INTERSPEECH.

[10]  Joachim Köhler,et al.  Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Caroline L. Smith Handbook of the International Phonetic Association: a guide to the use of the International Phonetic Alphabet (1999). Cambridge: Cambridge University Press. Pp. ix+204. , 2000, Phonology.

[12]  Chin-Hui Lee,et al.  A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..

[13]  S. Gokcen,et al.  A multilingual phoneme and model set: toward a universal base for automatic speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[14]  Tanja Schultz,et al.  Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Chalapathy Neti,et al.  Towards a universal speech recognizer for multiple languages , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[16]  Min Tang,et al.  Modeling linguistic features in speech recognition , 2003, INTERSPEECH.

[17]  Kazuhiro Kondo,et al.  An evaluation of cross-language adaptation for rapid HMM development in a new language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.