Source Coding of Audio Signals with a Generative Model

We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditioned on the quantized waveform. The proposed coding scheme is theoretically analyzed. Using SampleRNN as the generative model, we demonstrate that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.

[1]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[2]  Seong-Hyeon Shin,et al.  Audio Coding Based on Spectral Recovery by Convolutional Neural Network , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Quan Wang,et al.  Wavenet Based Low Rate Speech Coding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[5]  Yochai Blau,et al.  The Perception-Distortion Tradeoff , 2017, CVPR.

[6]  Yoshua Bengio,et al.  SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Yochai Blau,et al.  Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff , 2019, ICML.

[9]  M. Dietz,et al.  MPEG-4 high-efficiency AAC coding [Standards in a Nutshell] , 2008 .

[10]  W. Bastiaan Kleijn,et al.  Distribution Preserving Quantization With Dithering and Transformation , 2010, IEEE Signal Processing Letters.

[11]  Thomas C. Walters,et al.  Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Schuyler R. Quackenbush MPEG Unified Speech and Audio Coding , 2013, IEEE MultiMedia.

[13]  Björn Schuller,et al.  Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration , 2019, Neural Computing and Applications.

[14]  Timothy B. Terriberry,et al.  Definition of the Opus Audio Codec , 2012, RFC.

[15]  Kumar Krishna Agrawal,et al.  GANSynth: Adversarial Neural Audio Synthesis , 2019, ICLR.

[16]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[17]  Roch Lefebvre,et al.  The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[18]  Jan Skoglund,et al.  A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet , 2019, INTERSPEECH.

[19]  Eunmi Oh,et al.  Music Enhancement by a Novel CNN Architecture , 2018 .

[20]  Jan Skoglund,et al.  Improving Opus Low Bit Rate Quality with Neural Speech Synthesis , 2019, INTERSPEECH.

[21]  Cong Zhou,et al.  High-quality Speech Coding with Sample RNN , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  W. Bastiaan Kleijn,et al.  Quantization with Constrained Relative Entropy and Its Application to Audio Coding , 2009 .

[23]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.