A Novel Incremental Dictionary Learning Method for Low Bit Rate Speech Streaming

Speech streaming, which is widely used nowadays, cost a huge amount of transfer bandwidth and storage space. It is significant to compress them with as few bits as possible, meanwhile keep the voice clear and meaning unchanged. According to speech contexts, the proposed method can dynamically adapt to speech stream of any speaker by appending atoms to dictionary. Furthermore, in order to smoothly represent the amplitude envelopes shifting over the frequency, the dictionary is extended by Hilbert transform. The upper bounds of weights of atoms are constrained, so they can be quantized in practical applications. Experimental results show the advantages of our method. When the minimum reconstruction accuracy is 99.8%, which is applicable to general voice communications, the space saving is over 99%. Our method can adapt to the application with extreme bandwidth/storage limitation and large scale dataset, meanwhile keep reasonable perception quality.

[1]  T. S. Gunawan,et al.  Speech compression using compressive sensing on a multicore system , 2011, 2011 4th International Conference on Mechatronics (ICOM).

[2]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[3]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  T. Kishore Kumar,et al.  Speech enhancement using posterior regularized NMF with bases update , 2017, Comput. Electr. Eng..

[6]  Franz Pernkopf,et al.  A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario , 2011, INTERSPEECH.

[7]  Maher K. Mahmood Al-Azawi,et al.  Combined speech compression and encryption using chaotic compressive sensing with large key size , 2018, IET Signal Process..

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[10]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[11]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[12]  Wotao Yin,et al.  Online convolutional dictionary learning , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[13]  Dingwen Wang,et al.  Group-based single image super-resolution with online dictionary learning , 2016, EURASIP J. Adv. Signal Process..

[14]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[15]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[16]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[17]  Gabriel Peyré,et al.  Sparse Modeling of Textures , 2009, Journal of Mathematical Imaging and Vision.

[18]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[19]  Gaël Varoquaux,et al.  Stochastic Subsampling for Factorizing Huge Matrices , 2017, IEEE Transactions on Signal Processing.

[20]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[21]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.