论文信息 - Modelling speech line spectral frequencies with dirichlet mixture models

Modelling speech line spectral frequencies with dirichlet mixture models

Statistical modeling plays an important role in various research areas. It provides away to connect the data with the statistics. Based on the statistical properties of theobserved data, an appropriate model can be chosen that leads to a promising practicalperformance. The Gaussian distribution is the most popular and dominant probabilitydistribution used in statistics, since it has an analytically tractable Probability DensityFunction (PDF) and analysis based on it can be derived in an explicit form. However,various data in real applications have bounded support or semi-bounded support. As the support of the Gaussian distribution is unbounded, such type of data is obviously notGaussian distributed. Thus we can apply some non-Gaussian distributions, e.g., the betadistribution, the Dirichlet distribution, to model the distribution of this type of data.The choice of a suitable distribution is favorable for modeling efficiency. Furthermore,the practical performance based on the statistical model can also be improved by a bettermodeling. An essential part in statistical modeling is to estimate the values of the parametersin the distribution or to estimate the distribution of the parameters, if we consider themas random variables. Unlike the Gaussian distribution or the corresponding GaussianMixture Model (GMM), a non-Gaussian distribution or a mixture of non-Gaussian dis-tributions does not have an analytically tractable solution, in general. In this dissertation,we study several estimation methods for the non-Gaussian distributions. For the Maxi-mum Likelihood (ML) estimation, a numerical method is utilized to search for the optimalsolution in the estimation of Dirichlet Mixture Model (DMM). For the Bayesian analysis,we utilize some approximations to derive an analytically tractable solution to approxi-mate the distribution of the parameters. The Variational Inference (VI) framework basedmethod has been shown to be efficient for approximating the parameter distribution byseveral researchers. Under this framework, we adapt the conventional Factorized Approx-imation (FA) method to the Extended Factorized Approximation (EFA) method and useit to approximate the parameter distribution in the beta distribution. Also, the LocalVariational Inference (LVI) method is applied to approximate the predictive distributionof the beta distribution. Finally, by assigning a beta distribution to each element in thematrix, we proposed a variational Bayesian Nonnegative Matrix Factorization (NMF) forbounded support data. The performances of the proposed non-Gaussian model based methods are evaluatedby several experiments. The beta distribution and the Dirichlet distribution are appliedto model the Line Spectral Frequency (LSF) representation of the Linear Prediction (LP)model for statistical model based speech coding. For some image processing applications,the beta distribution is also applied. The proposed beta distribution based variationalBayesian NMF is applied for image restoration and collaborative filtering. Comparedto some conventional statistical model based methods, the non-Gaussian model basedmethods show a promising improvement.

Arne Leijon | Zhanyu Ma

[1] Jan Skoglund,et al. Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[2] K. Paliwal,et al. Efficient vector quantization of LPC parameters at 24 bits/frame , 1990 .

[3] Thippur V. Sreenivas,et al. Low complexity wideband LSF quantization using GMM of uncorrelated Gaussian mixtures , 2008, 2008 16th European Signal Processing Conference.

[4] Nizar Bouguila,et al. Practical Bayesian estimation of a finite beta mixture through gibbs sampling and its applications , 2006, Stat. Comput..

[5] Robert M. Gray. Gauss mixture vector quantization , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6] Jonas Samuelsson,et al. Bounded support Gaussian mixture modeling of speech spectra , 2003, IEEE Trans. Speech Audio Process..

[7] Arne Leijon,et al. Beta mixture models and the application to image classification , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[8] B.D. Rao,et al. PDF optimized parametric vector quantization with application to speech coding , 2000, Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154).

[9] Nizar Bouguila,et al. Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application , 2004, IEEE Transactions on Image Processing.

[10] Biing-Hwang Juang,et al. Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[11] Bhaskar D. Rao,et al. PDF optimized parametric vector quantization of speech line spectral frequencies , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[12] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[13] J.-P. Adoul,et al. Fast and low-complexity LSF quantization using algebraic vector quantizer , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14] J. Makhoul,et al. Vector quantization in speech coding , 1985, Proceedings of the IEEE.