A DNN regression approach to speech enhancement by artificial bandwidth extension

Artificial speech bandwidth extension (ABE) is an extremely effective means for speech enhancement at the receiver side of a narrowband telephony call. First approaches have been seen incorporating deep neural networks (DNNs) into the estimation of the upper band speech representation. In this paper we propose a regression-based DNN ABE being trained and tested on acoustically different speech databases, exceeding coded narrowband speech by a so-far unseen 1.37 CMOS points in a subjective listening test.

[1]  Tim Fingscheidt,et al.  Artificial bandwidth extension using deep neural networks for spectral envelope estimation , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[2]  Li-Rong Dai,et al.  Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks , 2016, INTERSPEECH.

[3]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4]  Chin-Hui Lee,et al.  A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Roar Hagen,et al.  Spectral quantization of cepstral coefficients , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Israel Cohen,et al.  Evaluation of a Speech Bandwidth Extension Algorithm Based on Vocal Tract Shape Estimation , 2012, IWAENC.

[7]  Peter J. Patrick Enhancement of band-limited speech signals , 1983 .

[8]  Tim Fingscheidt,et al.  Reference-free SNR Measurement for Narrowband and Wideband Speech Signals in Car Noise , 2012, ITG Conference on Speech Communication.

[9]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[10]  Chandra Sekhar Seelamantula,et al.  Joint dictionary training for bandwidth extension of speech signals , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Patrick Bauer,et al.  A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[12]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[13]  Patrick Bauer,et al.  HMM-based artificial bandwidth extension supported by neural networks , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[14]  Tim Fingscheidt,et al.  Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[16]  John Makhoul,et al.  High-frequency regeneration in speech coding systems , 1979, ICASSP.

[17]  Yaxing Li,et al.  Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation , 2016, IET Signal Process..

[18]  Engin Erzin,et al.  Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[19]  Paavo Alku,et al.  A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Paavo Alku,et al.  Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[22]  Peter Kabal,et al.  Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech , 2011, INTERSPEECH.

[23]  Shenghui Zhao,et al.  Speech bandwidth expansion based on deep neural networks , 2015, INTERSPEECH.

[24]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[25]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[26]  Peter Jax,et al.  Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[27]  Bin Liu,et al.  A novel method of artificial bandwidth extension using deep architecture , 2015, INTERSPEECH.

[28]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .