Joint lexicon, acoustic unit inventory and model design

Although most parameters in a speech recognition system are estimated from data by the use of an objective function, the unit inventory and lexicon are generally hand crafted and therefore unlikely to be optimal. This paper proposes a joint solution to the related problems of learning a unit inventory and corresponding lexicon from data. On a speaker-independent read speech task with a 1k vocabulary, the proposed algorithm outperforms phone-based systems at both high and low complexities. Obwohl die meisten Parameter eines Spracherkennungssystems aus Daten geschatzt werden, ist die Wahl der akustischen Grundeinheiten und des Lexikons normalerweise nicht automatisch und deshalb wahrscheinlich nicht optimal. Dieser Artikel stellt einen kombinierten Ansatz fur die Losung dieser verwandten Probleme dar - das Lernen von akustischen Grundeinheiten und des zugehorigen Lexikons aus Daten. Experimente mit sprecher-unabhangigen gelesenen Sprachdaten mit einem Vokabular von 1000 Wortern zeigen, da?s der vorgestellte Ansatz besser ist als ein System niedriger oder hoherer Komplexitat, das auf Phonemen basiert ist. Bien que la plupart des parametres dans un systeme de reconnaissance de la parole soient estimes a partie des donnees en utilisant une fonction objective, l'inventaire des unites acoustiques et le lexique sont generalement crees a la main, et donc susceptibles de ne pas etre optimeux. Cette etude propose une solution conjointe aux problemes interdependants que sont l'apprentissage a partir des donnees d'un inventaire des unites acoustiques et du lexique correspondant. Nous avons teste l'algorithme propose sur des echantillons lus, en reconnaissance independantes du locuteur avec un vocabulaire de 1k: il surpasse les systemes phonetiques en faible ou forte complexite.

[1]  Kuldip K. Paliwal Lexicon-building methods for an acoustic sub-word based speech recognizer , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Michael Picheny,et al.  A method for the construction of acoustic Markov models for words , 1993, IEEE Trans. Speech Audio Process..

[3]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[4]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Torbjørn Svendsen,et al.  Incorporating linguistic knowledge and automatic baseform generation in acoustic subword unit based speech recognition , 1997, EUROSPEECH.

[6]  Steve J. Young,et al.  The HTK tied-state continuous speech recogniser , 1993, EUROSPEECH.

[7]  Trym Holter,et al.  Combined optimisation of baseforms and model parameters in speech recognition based on acoustic subword units , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  Chin-Hui Lee,et al.  Word recognition using whole word and subword models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Kuldip K. Paliwal,et al.  An improved sub-word based speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Helmer Strik,et al.  Proceedings of the ESCA Workshop 'Modeling Pronunciation Variation for Automatic Speech Recognition' , 1998 .

[11]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mari Ostendorf,et al.  A comparison of constrained trajectory segment models for large vocabulary speech recognition , 1998, IEEE Trans. Speech Audio Process..

[13]  Torbjørn Svendsen,et al.  On the automatic segmentation of speech signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[15]  Kuldip K. Paliwal,et al.  Design of a speech recognition system based on acoustically derived segmental units , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Frank K. Soong,et al.  Optimizing baseforms for HMM-based speech recognition , 1995, EUROSPEECH.

[17]  Mari Ostendorf,et al.  Using automatically-derived acoustic sub-word units in large vocabulary speech recognition , 1998, ICSLP.

[18]  Torbjørn Svendsen,et al.  Maximum likelihood modelling of pronunciation variation , 1999, Speech Commun..