Learning Bandwidth Expansion Using Perceptually-motivated Loss

We introduce a perceptually motivated approach to bandwidth expansion for speech. Our method pairs a new 3-way split variant of the FFTNet neural vocoder structure with a perceptual loss function, combining objectives from both the time and frequency domains. Mean opinion score tests show that it outperforms baseline methods from both domains, even for extreme bandwidth expansion.

[1]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[2]  Stephane Villette,et al.  Speech Bandwidth Extension Using Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Li-Rong Dai,et al.  Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Chin-Hui Lee,et al.  DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech , 2015, INTERSPEECH.

[5]  A. F. Machado,et al.  VOICE CONVERSION: A CRITICAL SURVEY , 2010 .

[6]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[7]  Martin Etnestad Johansen,et al.  Bandwidth Extension of Telephony Speech , 2009 .

[8]  Gautham J. Mysore,et al.  Can we Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?—A Dataset, Insights, and Challenges , 2015, IEEE Signal Processing Letters.

[9]  Bernd Edler,et al.  Blind Bandwidth Extension Based on Convolutional and Recurrent Deep Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yoshua Bengio,et al.  SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.

[11]  Adam Finkelstein,et al.  Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Gerhard Schmidt,et al.  Neural networks versus codebooks in an application for bandwidth extension of speech signals , 2003, INTERSPEECH.

[13]  Shenghui Zhao,et al.  Speech bandwidth expansion based on deep neural networks , 2015, INTERSPEECH.

[14]  Xavier Serra,et al.  A Wavenet for Speech Denoising , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[16]  Minh N. Do,et al.  Time-Frequency Networks for Audio Super-Resolution , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Stefano Ermon,et al.  Audio Super Resolution using Neural Networks , 2017, ICLR.

[18]  Tim Fingscheidt,et al.  A Simple Cepstral Domain DNN Approach to Artificial Speech Bandwidth Extension , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[20]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[21]  Bin Liu,et al.  A novel method of artificial bandwidth extension using deep architecture , 2015, INTERSPEECH.

[22]  Konstantin Schmidt,et al.  Low complexity tonality control in the Intelligent Gap Filling tool , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Chin-Hui Lee,et al.  A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Tim Fingscheidt,et al.  Artificial bandwidth extension using deep neural networks for spectral envelope estimation , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[26]  Li-Rong Dai,et al.  Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks , 2016, INTERSPEECH.