Empirical properties of multilingual phone-to-word transduction

This paper explores the error-robustness of phone-to-word transduction across a variety of languages. We implement a noisy channel model in which a phonetic input stream is corrupted by an error model, and then transduced back to words using the inverse error model and linguistic constraints. By controlling the error level, we are able to measure the sensitivity of different languages to degradation in the phonetic input stream. This analysis is carried further to measure the importance of each phone in each language individually. We study Arabic, Chinese, English, German and Spanish, and find that they behave similarly in this paradigm: in each case, a phone error produces about 1.4 word errors, and frequently incorrect phones matter slightly less than others. In the absence of phone errors, transduced word errors are still present, and we use the conditional entropy of words given phones to explain the observed behavior.

[1]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[2]  J. Wolf,et al.  The HWIM speech understanding system , 1977 .

[3]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[4]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[5]  Biing-Hwang Juang,et al.  An overview on automatic speech attribute transcription (ASAT) , 2007, INTERSPEECH.

[6]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hugo Van hamme,et al.  Robust phone lattice decoding , 2006, INTERSPEECH.

[8]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[9]  Hugo Van hamme,et al.  FLavor: a flexible architecture for LVCSR , 2003, INTERSPEECH.

[10]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[11]  Lori Lamel,et al.  Multilingual phone recognition of spontaneous telephone speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Patrick Schone,et al.  Language-reconfigurable universal phone recognition , 2003, INTERSPEECH.

[13]  Hermann Ney,et al.  Language-model look-ahead for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.