Subspace-GMM acoustic models for under-resourced languages: feasibility study

Acoustic model parameter estimation is hampered by a lack of data. To reduce the number of parameters to be estimated, we propose sub-GMM modelling, which constrains the acoustic models to a low- dimensional manifold embedded in the space of Gaussian mixture weights. The manifold model is obtained through non-negative matrix factorization with sparsity constraints. Our preliminary mono- lingual experiments show that the proposed model is as efficient as clustering the distributions to a smaller set, while it opens perspectives for a new parameter tying technique. In the example, the number of parameters to be estimated per distribution is reduced more than an order of magnitude.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  P. Niyogi,et al.  A Geometric Perspective on Speech Sounds , 2005 .

[4]  Chin-Hui Lee,et al.  Toward a detector-based universal phone recognizer , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Richard M. Schwartz,et al.  The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[6]  Zhaoshui He,et al.  Extended SMART Algorithms for Non-negative Matrix Factorization , 2006, ICAISC.

[7]  Kris Demuynck,et al.  Extracting, modelling and combining information in speech recognition , 2001 .

[8]  Wu Chou,et al.  Robust decision tree state tying for continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..

[9]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Laurent Besacier,et al.  Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[13]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[14]  Ngoc Thang Vu,et al.  Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training , 2011, INTERSPEECH.

[15]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..