论文信息 - Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer

Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer

This paper presents a multi-frame quantization of line spectral frequency (LSF) parameters using a deep autoencoder (DAE) and pyramid vector quantizer (PVQ). The object is to provide sophisticated LSF quantization for the ultra-low bit rate speech coders with moderate delay. For the compression and de-correlation of multiple LSF frames, a DAE possessing linear coder-layer units with Gaussian noise is used. The DAE demonstrates a high degree of modelling flexibility for multiple LSF frames. To quantize the coder-layer vector effectively, a PVQ is considered. Comparing the discrete cosine model (DCM), the DAE-based compression shows better modelling accuracy of multi-frame LSF parameters and possesses an advantage in that the coder-layer dimensions could be any value. The compressed coder-layer dimensions of the DAE govern the trade-off between the modelling distortion and the coder-layer quantization distortion. The experimental results show that the proposed algorithm with determined optimal coder-layer dimension outperforms the DCM-based multi-frame LSF quantization approach in terms of spectral distortion (SD) performance and robustness across different speech segments.

[1] Paris Smaragdis,et al. Experiments on deep learning for speech denoising , 2014, INTERSPEECH.

[2] Qiuyun Hao,et al. 400bps High-Quality Speech Coding Algorithm , 2016, 2016 International Symposium on Computer, Consumer and Control (IS3C).

[3] Ahmet M. Kondoz,et al. Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[4] Yaxing Li,et al. Deep neural network-based linear predictive parameter estimations for speech enhancement , 2017, IET Signal Process..

[5] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6] Pengfei Duan,et al. Multi-frame Coding of LSF Parameters Using Block-Constrained Trellis Coded Vector Quantization , 2018, INTERSPEECH.

[7] Hironobu Fujiyoshi,et al. To Be Bernoulli or to Be Gaussian, for a Restricted Boltzmann Machine , 2014, 2014 22nd International Conference on Pattern Recognition.

[8] Joel Max,et al. Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[9] Laurent Girin,et al. Long-Term Quantization of Speech LSF Parameters , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10] Yaxing Li,et al. Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation , 2016, IET Signal Process..

[11] Roberto Roncella,et al. A pyramid vector quantizer chip for HDTV applications , 1997, Eur. Trans. Telecommun..

[12] Peng Zhang,et al. A variable-bit-rate speech coding algorithm based on enhanced mixed excitation linear prediction , 2016, 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[13] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.

[14] Qiang Li,et al. A 600bps Vocoder Algorithm Based on MELP , 2017 .

[15] Peter Glöckner,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[16] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.