Model order estimation using Bayesian NMF for discovering phone patterns in spoken utterances

In earlier work, we have shown that vocabulary discovery from spoken utterances and subsequent recognition of the acquired vocabulary can be achieved through Non-negative Matrix Factorization (NMF). An open issue for this task is to determine automatically how many different word representations should be included in the model. In this paper, Bayesian NMF is applied to estimate the model order. The per-utterance word activations are given a gamma prior while the word models are assumed deterministic. Two Bayesian approaches are applied for obtaining optimal parameter values. First, the penalized joint log-likelihood of the parameters is considered as the objective function. Then, maximal marginal likelihood estimator (MMLE) is implemented which obtains the word models maximizing the likelihood after integration over the activations. The variational Bayesian algorithm, which maximizes a lower bound of the marginal log-likelihood, is applied to this optimization problem. The number of required latent components or basis vectors (model order) is estimated by evaluating likelihood metrics. The inferred model order is validated by observing error criteria on a test set. Experiments on synthetic data as well as real speech show that MMLE is more effective for the purpose of model order selection.