Non-negative durational HMM

Non-negative HMM (N-HMM) has been proposed in the literature as a combination of NMF (non-negative matrix factorisation) and HMM, to model a mixture of non-stationary signals using latent variables. The original formulation of N-HMM does not generalise to unseen data and hence limits its usage in automatic speech recognition (ASR). We propose modifications to the N-HMM formulation to generalise for unseen data and thereby making it suitable for ASR. The modified model is referred to as Non-negative durational HMM (NdHMM). We derive the EM algorithm for estimating the NdHMM parameters and show that the proposed model requires less number of parameters than conventional HMM.

[1]  Barak A. Pearlmutter,et al.  Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006 .

[2]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[3]  B. Raj,et al.  Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[4]  Kris Demuynck,et al.  Discovering Phone Patterns in Spoken Utterances by , 2008 .

[5]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[6]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[7]  Paris Smaragdis,et al.  A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Mike E. Davies,et al.  Latent Variable Analysis and Signal Separation , 2010 .

[9]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[10]  Julius O. Smith,et al.  A non-negative framework for joint modeling of spectral structure and temporal dynamics in sound mixtures , 2010 .

[11]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..