论文信息 - Model order estimation using Bayesian NMF for discovering phone patterns in spoken utterances

Model order estimation using Bayesian NMF for discovering phone patterns in spoken utterances

In earlier work, we have shown that vocabulary discovery from spoken utterances and subsequent recognition of the acquired vocabulary can be achieved through Non-negative Matrix Factorization (NMF). An open issue for this task is to determine automatically how many different word representations should be included in the model. In this paper, Bayesian NMF is applied to estimate the model order. The per-utterance word activations are given a gamma prior while the word models are assumed deterministic. Two Bayesian approaches are applied for obtaining optimal parameter values. First, the penalized joint log-likelihood of the parameters is considered as the objective function. Then, maximal marginal likelihood estimator (MMLE) is implemented which obtains the word models maximizing the likelihood after integration over the activations. The variational Bayesian algorithm, which maximizes a lower bound of the marginal log-likelihood, is applied to this optimization problem. The number of required latent components or basis vectors (model order) is estimated by evaluating likelihood metrics. The inferred model order is validated by observing error criteria on a test set. Experiments on synthetic data as well as real speech show that MMLE is more effective for the purpose of model order selection.

Hugo Van hamme | Yaser Norouzi | Sayeh Mirzaei

[1] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2] Hugo Van hamme,et al. HAC-models: a novel approach to continuous speech recognition , 2008, INTERSPEECH.

[3] Hugo Van hamme,et al. Learning from images and speech with Non-negative Matrix Factorization enhanced by input space scaling , 2010, 2010 IEEE Spoken Language Technology Workshop.

[4] Walter Daelemans,et al. A Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction , 2012, INTERSPEECH.

[5] Hugo Van hamme,et al. Unsupervised vocabulary discovery using non-negative matrix factorization with graph regularization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Ali Taylan Cemgil,et al. Bayesian inference in hierarchical non‐negative matrix factorisation models of musical sounds , 2008 .

[7] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.