The clustering algorithm for the definition of multilingual set of context dependent speech models

The paper addresses the problem of designing a language independent phonetic inventory for the speech recognisers with multilingual vocabulary. A new clustering algorithm for the definition of multilingual set of triphones is proposed. The clustering algorithm bases on a definition of a distance measure for triphones defined as a weighted sum of explicit estimates of the context similarity on a monophone level. The monophone similarity estimation method based on the algorithm of Houtgast. The clustering algorithm is integrated in a multilingual speech recognition system based on HTK V2.1.1. The ongoing experiments are based on the SpeechDat II databases. So far, experiments included the Slovenian, Spanish and German 1000 FDB SpeechDat (II) database. Current results are very promising. The use of clustering algorithm resulted in a significant reduction of the number of triphones at acceptable level of word and language identification accuracy degradation.

[1]  Erwin Marschall,et al.  METHODS FOR IMPROVED SPEECH RECOGNITION OVER TELEPHONE LINES , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Andreas Stolcke,et al.  A study of multilingual speech recognition , 1997, EUROSPEECH.

[3]  Richard Winski,et al.  European speech databases for telephone applications , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Paul Dalsgaard,et al.  Data-driven identification of poly- and mono-phonemes for four european languages , 1993, EUROSPEECH.

[5]  Joachim Köhler,et al.  Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Patrizia Bonaventura,et al.  Multilingual speech recognition for flexible vocabularies , 1997, EUROSPEECH.

[7]  Etienne Barnard,et al.  Automatic language identification with sequences of language-independent phoneme clusters , 1996 .