Learning to Dequantize Speech Signals by Primal-dual Networks: an Approach for Acoustic Sensor Networks

We introduce a method to improve the quality of simple scalar quantization in the context of acoustic sensor networks by combining ideas from sparse reconstruction, artificial neural networks and weighting filters. We start from the observation that optimization methods based on sparse reconstruction resemble the structure of a neural network. Hence, building upon a successful enhancement method, we unroll the algorithms and use this to build a neural network which we train to obtain enhanced decoding. In addition, the weighting filter from code-excited linear predictive (CELP) speech coding is integrated into the loss function of the neural network, achieving perceptually improved reconstructed speech. Our experiments show that our proposed trained methods allow for better speech reconstruction than the reference optimization methods.

[1]  Tim Fingscheidt,et al.  An improved adpcm decoder by adaptively controlled quantization interval centroids , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[2]  Tim Fingscheidt,et al.  Convolutional Neural Networks to Enhance Coded Speech , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Roch Lefebvre,et al.  The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[4]  Christoph Brauer,et al.  Primal-dual residual networks , 2018, ArXiv.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[7]  Jean-Philippe Vert Large-Scale Machine Learning , 2020, Mining of Massive Datasets.

[8]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[9]  Richard C. Hendriks,et al.  Distributed Delay and Sum Beamformer for Speech Enhancement via Randomized Gossip , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Tim Fingscheidt,et al.  A CNN Postprocessor to Enhance Coded Speech , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Ziyue Zhao,et al.  Enhancement of G.711-Coded Speech Providing Quality Higher Than Uncoded , 2018, ITG Symposium on Speech Communication.

[13]  Marc Moonen,et al.  Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating , 2010, IEEE Transactions on Signal Processing.

[14]  Johannes Fischer,et al.  Blind Recovery of Perceptual Models in Distributed Speech and Audio Coding , 2016, INTERSPEECH.

[15]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[16]  Tim Fingscheidt,et al.  Improving scalar quantization for correlated processes using adaptive codebooks only at the receiver , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[17]  Patrick A. Naylor,et al.  Audio coding in wireless acoustic sensor networks , 2015, Signal Process..

[18]  Christoph Brauer,et al.  Sparse reconstruction of quantized speech signals , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[20]  Tim Fingscheidt,et al.  Improving Vector Quantization-Based Decoders for Correlated Processes in Error-Free Transmission , 2016, ITG Symposium on Speech Communication.