Statistical Speech Model Description with VMF Mixture Model

In this paper, we present the LSF parameters by a unit vector form, which has directional characteristics. The underlying distribution of this unit vector variable is modeled by a von Mises-Fisher mixture model (VMM). With the high rate theory, the optimal inter-component bit allocation strategy is proposed and the distortion-rate (D-R) relation is derived for the VMM based-VQ (VVQ). Experimental results show that the VVQ outperforms our recently introduced DVQ and the conventional GVQ.

[1]  Fredrik Wallin,et al.  A Comprehensive Review of Smart Energy Meters in Intelligent Energy Networks , 2016, IEEE Internet of Things Journal.

[2]  Markus Flierl,et al.  Multiview depth map enhancement by variational bayes inference estimation of Dirichlet mixture models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1990 .

[4]  Jun Guo,et al.  Cross-modal subspace learning for fine-grained sketch-based image retrieval , 2017, Neurocomputing.

[5]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[6]  Jun Guo,et al.  Feature selection for neutral vector in EEG signal classification , 2016, Neurocomputing.

[7]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[9]  Zhanyu Ma,et al.  A probabilistic principal component analysis based hidden Markov model for audio-visual speech recognition , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[10]  Arne Leijon,et al.  Human audio-visual consonant recognition analyzed with three bimodal integration models , 2009, INTERSPEECH.

[11]  Jun Guo,et al.  Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[13]  Markus Flierl,et al.  Probabilistic Multiview Depth Image Enhancement Using Variational Inference , 2015, IEEE Journal of Selected Topics in Signal Processing.

[14]  Jun Guo,et al.  Dirichlet mixture modeling to estimate an empirical lower bound for LSF quantization , 2014, Signal Process..

[15]  Honggang Zhang,et al.  Variational Bayesian Matrix Factorization for Bounded Support Data , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jun Guo,et al.  Activation force-based air pollution tracing , 2016, 2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC).

[18]  Jonas Samuelsson,et al.  Bounded support Gaussian mixture modeling of speech spectra , 2003, IEEE Trans. Speech Audio Process..

[19]  P. Alku,et al.  On line spectral frequencies , 2003, IEEE Signal Processing Letters.

[20]  Markus Flierl,et al.  Bayesian estimation of Dirichlet mixture model with variational inference , 2014, Pattern Recognit..

[21]  Jalil Taghia,et al.  On von-mises fisher mixture model in text-independent speaker identification , 2013, INTERSPEECH.

[22]  Arne Leijon,et al.  Vector quantization of LSF parameters with a mixture of dirichlet distributions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Jan Skoglund,et al.  Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[24]  Suvrit Sra,et al.  A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x) , 2012, Comput. Stat..

[25]  Bhaskar D. Rao,et al.  PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[26]  Arne Leijon,et al.  Human skin color detection in RGB space with Bayesian estimation of beta mixture models , 2010, 2010 18th European Signal Processing Conference.

[27]  Arne Leijon,et al.  Modelling speech line spectral frequencies with dirichlet mixture models , 2010, INTERSPEECH.

[28]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[29]  Thippur V. Sreenivas,et al.  Low complexity wideband LSF quantization using GMM of uncorrelated Gaussian mixtures , 2008, 2008 16th European Signal Processing Conference.

[30]  Qie Sun,et al.  Statistical analysis of energy consumption patterns on the heat demand of buildings in district heating systems , 2014 .

[31]  Jon Hamkins,et al.  Gaussian source coding with spherical codes , 2002, IEEE Trans. Inf. Theory.

[32]  Zhanyu Ma Bayesian estimation of the Dirichlet distribution with expectation propagation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[33]  Jianhua Zhang,et al.  Data scheme-based wireless channel modeling method: motivation, principle and performance , 2017, Journal of Communications and Information Networks.

[34]  Arne Leijon,et al.  Expectation propagation for estimating the parameters of the beta distribution , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Jun Guo,et al.  Histogram transform model using MFCC features for text-independent speaker identification , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[36]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[37]  Jun Guo,et al.  The Role of Data Analysis in the Development of Intelligent Energy Networks , 2017, IEEE Network.

[38]  Jun Liu,et al.  User intention understanding from scratch , 2016, 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE).

[39]  Zhanyu Ma,et al.  A variational Bayes beta Mixture Model for Feature Selection in DNA methylation Studies , 2013, J. Bioinform. Comput. Biol..

[40]  Jun Guo,et al.  DNN Filter Bank Cepstral Coefficients for Spoofing Detection , 2017, IEEE Access.

[41]  Jun Guo,et al.  Effect of multi-condition training and speech enhancement methods on spoofing detection , 2016, 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE).

[42]  Arne Leijon,et al.  PDF-optimized LSF vector quantization based on beta mixture models , 2010, INTERSPEECH.

[43]  Jun Guo,et al.  Cross-modal subspace learning for sketch-based image retrieval: A comparative study , 2016, 2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC).

[44]  Honggang Zhang,et al.  Nonlinear estimation of missing ΔLSF parameters by a mixture of Dirichlet distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Robert M. Gray,et al.  Asymptotic Performance of Vector Quantizers with a Perceptual Distortion Measure , 1997, IEEE Trans. Inf. Theory.