Vector quantization of LSF parameters with a mixture of dirichlet distributions

Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences LSF can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loève transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter- and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost

[1]  A. Enis Çetin,et al.  Interframe differential coding of line spectrum frequencies , 1994, IEEE Trans. Speech Audio Process..

[2]  Thippur V. Sreenivas,et al.  Optimum switched split vector quantization of LSF parameters , 2008, Signal Process..

[3]  Thippur V. Sreenivas,et al.  Predicting VQ Performance Bound for LSF Coding , 2008, IEEE Signal Processing Letters.

[4]  Biing-Hwang Juang,et al.  Optimal quantization of LSP parameters , 1993, IEEE Trans. Speech Audio Process..

[5]  Nizar Bouguila,et al.  Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application , 2004, IEEE Transactions on Image Processing.

[6]  K. Paliwal,et al.  Quantization of LPC Parameters , 2022 .

[7]  L. Hanzo,et al.  Speech spectral quantizers for wideband speech coding , 2001, Eur. Trans. Telecommun..

[8]  Mattias Nilsson,et al.  On entropy-constrained vector quantization using gaussian mixture models , 2008, IEEE Transactions on Communications.

[9]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[10]  N. L. Johnson,et al.  Continuous Multivariate Distributions: Models and Applications , 2005 .

[11]  Arne Leijon,et al.  Modelling speech line spectral frequencies with dirichlet mixture models , 2010, INTERSPEECH.

[12]  W. Bastiaan Kleijn,et al.  Regularized Linear Prediction of Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Kuldip K. Paliwal,et al.  Switched split vector quantisation of line spectral frequencies for wideband speech coding , 2005, INTERSPEECH.

[14]  J.-P. Adoul,et al.  Fast and low-complexity LSF quantization using algebraic vector quantizer , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Robert M. Gray,et al.  Asymptotic Performance of Vector Quantizers with a Perceptual Distortion Measure , 1997, IEEE Trans. Inf. Theory.

[16]  Amir K. Khandani,et al.  Quantization of line spectral parameters using a trellis structure , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17]  S. Chatterjee,et al.  Gaussian Mixture Model Based Switched Split Vector Quantization of LSF Parameters , 2007, 2007 IEEE International Symposium on Signal Processing and Information Technology.

[18]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[19]  Bhaskar D. Rao,et al.  PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[20]  Roar Hagen,et al.  Low bit-rate spectral coding in CELP, a new LSP-method , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  Jan Skoglund,et al.  Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[22]  Franklin T. Luk,et al.  Fast singular value algorithm for Hankel matrices , 2001, SPIE Optics + Photonics.

[23]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[24]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[25]  Sung-Joo Kim,et al.  Split vector quantization of LSF parameters with minimum of dLSF constraint , 1999, IEEE Signal Processing Letters.

[26]  W. Bastiaan Kleijn,et al.  Rate Distribution Between Model and Signal , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[27]  Mikael Skoglund,et al.  Structured Gaussian mixture model based product VQ , 2010, 2010 18th European Signal Processing Conference.

[28]  Venkatesh Krishnan,et al.  Optimal multistage vector quantization of LPC parameters over noisy channels , 2004, IEEE Transactions on Speech and Audio Processing.

[29]  Arne Leijon,et al.  PDF-optimized LSF vector quantization based on beta mixture models , 2010, INTERSPEECH.

[30]  P. Deb Finite Mixture Models , 2008 .

[31]  Arne Leijon,et al.  Super-Dirichlet Mixture Models Using Differential Line Spectral Frequencies for Text-Independent Speaker Identification , 2011, INTERSPEECH.

[32]  Arne Leijon,et al.  Human skin color detection in RGB space with Bayesian estimation of beta mixture models , 2010, 2010 18th European Signal Processing Conference.

[33]  Kuldip K. Paliwal,et al.  Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters , 2006, IEEE Signal Processing Letters.

[34]  Bhaskar D. Rao,et al.  A High-Rate Optimal Transform Coder With Gaussian Mixture Companders , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Thippur V. Sreenivas,et al.  Low complexity wideband LSF quantization using GMM of uncorrelated Gaussian mixtures , 2008, 2008 16th European Signal Processing Conference.

[36]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[37]  Richard V. Cox,et al.  Spectral quantization and interpolation for CELP coders , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[38]  Sangwon Kang,et al.  Safety-net pyramid VQ of LSF parameters for wideband speech codecs , 2001 .

[39]  Arne Leijon,et al.  Beta mixture models and the application to image classification , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[40]  Tzu-Tsung Wong,et al.  Parameter estimation for generalized Dirichlet distributions from the sample estimates of the first and the second moments of random variables , 2010, Comput. Stat. Data Anal..

[41]  Robert J. Connor,et al.  Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution , 1969 .

[42]  Maya R. Gupta,et al.  Introduction to the Dirichlet Distribution and Related Processes , 2010 .

[43]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  B.D. Rao,et al.  PDF optimized parametric vector quantization with application to speech coding , 2000, Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154).

[45]  Kuldip K. Paliwal,et al.  A comparison of LSF and ISP representations for wideband LPC parameter coding using the switched split vector quantiser , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[46]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[47]  Biing-Hwang Juang,et al.  Multiple stage vector quantization for speech coding , 1982, ICASSP.

[48]  Vivek K Goyal High-rate transform coding: how high is high, and does it matter? , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[49]  J. Mosimann,et al.  A New Characterization of the Dirichlet Distribution Through Neutrality , 1980 .

[50]  Yuval Bistritz,et al.  Immittance spectral pairs (ISP) for speech encoding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[52]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[53]  I. James,et al.  Products of Independent Beta Variables with Application to Connor and Mosimann's Generalized Dirichlet Distribution , 1972 .

[54]  Vivek K. Goyal,et al.  Theoretical foundations of transform coding , 2001, IEEE Signal Process. Mag..

[55]  Frank K. Soong,et al.  Optimal quantization of LSP parameters using delayed decisions , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[56]  Jonas Samuelsson,et al.  Bounded support Gaussian mixture modeling of speech spectra , 2003, IEEE Trans. Speech Audio Process..

[57]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.