论文信息 - A DNN regression approach to speech enhancement by artificial bandwidth extension

A DNN regression approach to speech enhancement by artificial bandwidth extension

Artificial speech bandwidth extension (ABE) is an extremely effective means for speech enhancement at the receiver side of a narrowband telephony call. First approaches have been seen incorporating deep neural networks (DNNs) into the estimation of the upper band speech representation. In this paper we propose a regression-based DNN ABE being trained and tested on acoustically different speech databases, exceeding coded narrowband speech by a so-far unseen 1.37 CMOS points in a subjective listening test.

Tim Fingscheidt | Johannes Abel

[1] Tim Fingscheidt,et al. Artificial bandwidth extension using deep neural networks for spectral envelope estimation , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[2] Li-Rong Dai,et al. Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks , 2016, INTERSPEECH.

[3] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4] Chin-Hui Lee,et al. A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Roar Hagen,et al. Spectral quantization of cepstral coefficients , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Israel Cohen,et al. Evaluation of a Speech Bandwidth Extension Algorithm Based on Vocal Tract Shape Estimation , 2012, IWAENC.

[7] Peter J. Patrick. Enhancement of band-limited speech signals , 1983 .

[8] Tim Fingscheidt,et al. Reference-free SNR Measurement for Narrowband and Wideband Speech Signals in Car Noise , 2012, ITG Conference on Speech Communication.

[9] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[10] Chandra Sekhar Seelamantula,et al. Joint dictionary training for bandwidth extension of speech signals , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Patrick Bauer,et al. A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[12] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[13] Patrick Bauer,et al. HMM-based artificial bandwidth extension supported by neural networks , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[14] Tim Fingscheidt,et al. Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15] J. C. Steinberg,et al. Factors Governing the Intelligibility of Speech Sounds , 1945 .

[16] John Makhoul,et al. High-frequency regeneration in speech coding systems , 1979, ICASSP.

[17] Yaxing Li,et al. Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation , 2016, IET Signal Process..

[18] Engin Erzin,et al. Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[19] Paavo Alku,et al. A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Paavo Alku,et al. Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[22] Peter Kabal,et al. Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech , 2011, INTERSPEECH.

[23] Shenghui Zhao,et al. Speech bandwidth expansion based on deep neural networks , 2015, INTERSPEECH.

[24] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[25] Khalid Choukri,et al. SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[26] Peter Jax,et al. Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[27] Bin Liu,et al. A novel method of artificial bandwidth extension using deep architecture , 2015, INTERSPEECH.

[28] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .