Training Acoustic Models with Speech Data from Different Languages

We present a technique to train acoustic models for a target language using speech data from distinct source languages. In this approach, no native training data from the target language is required. The acoustic model candidates for each targetlanguage phoneme are automatically selected from a group of existing source languages by means of a combined phoneticphonological (CPP) metric, developed by incorporating statistically-derived phonetic and phonological distance information (Liu and Melnar, Interspeech 2005). The method assumes availability of sufficient native training data for the source languages and pronunciation lexica for both the target and source languages. Once the model candidates are determined for each target-language phoneme, the target HMMs are trained with the speech data from the source languages by means of a “silkie-hen-on-duck-eggs” strategy – namely the target phoneme model training is embedded in the source phoneme model training. The recognition performance of the resultant models is comparable to that of our previously-reported CPP-derived models built through multimixture construction while the size of the current models is only a fraction of the previous models, depending on the number of HMM candidates used for each target phoneme. Utilizing the CPP metric, both versions of the models reach the performance of models generated by a data-driven acoustic-distance mapping approach, far above the general phoneme symbol-based cross-language transfer strategies.

[1]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[2]  Brett Kessler,et al.  Computational dialectology in Irish Gaelic , 1995, EACL.

[3]  Etienne Barnard,et al.  Phone clustering using the Bhattacharyya distance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Harold L. Somers Similarity Metrics for Aligning Children's Articulation Data , 1998, COLING-ACL.

[5]  Tanja Schultz,et al.  Multilingual and Crosslingual Speech Recognition , 1998 .

[6]  Joachim Köhler Language adaptation of multilingual phone models for vocabulary independent speech recognition tasks , 1998, ICASSP.

[7]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Tanja Schultz,et al.  Polyphone decision tree specialization for language adaptation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[10]  Elizabeth C. Botha,et al.  COMPARISON OF ACOUSTIC DIS AUTOMATIC CROSS-LANGUAG , 2002 .

[11]  Philips,et al.  CROSS-LANGUAGE TRANSFER OF MULTILINGUAL PHONEME MODELS , 2003 .

[12]  Chen Liu,et al.  An automated linguistic knowledge-based cross-language transfer method for building acoustic models for a language without native training data , 2005, INTERSPEECH.