论文信息 - Artificial bandwidth extension using deep neural networks for spectral envelope estimation

Artificial bandwidth extension using deep neural networks for spectral envelope estimation

Many artificial speech bandwidth extension (ABE) approaches perform source-filter decomposition of the input narrowband speech, with subsequent computation of upper frequency band (UB) spectral envelope posteriors. In this paper we perform a direct comparison of HMM- and deep neural network (DNN)-based modeling of likelihoods or posteriors for ABE UB envelope estimation. DNN-based approaches turn out to significantly exceed GMM-based ones in speech quality. Further analysis reveals that this is not due to a better shape of the estimated UB spectral envelope, but primarily due to a much better estimate of the energy ratio of the upper band vs. the lower band - an important result with significant impact on ABE speech quality particularly for fricative sounds.

Tim Fingscheidt | Maximilian Strake | Johannes Abel

[1] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[2] Peter Kabal,et al. Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech , 2011, INTERSPEECH.

[3] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[4] Peter J. Patrick. Enhancement of band-limited speech signals , 1983 .

[5] Tim Fingscheidt,et al. Reference-free SNR Measurement for Narrowband and Wideband Speech Signals in Car Noise , 2012, ITG Conference on Speech Communication.

[6] Paavo Alku,et al. Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Bin Liu,et al. A novel method of artificial bandwidth extension using deep architecture , 2015, INTERSPEECH.

[8] Shenghui Zhao,et al. Speech bandwidth expansion based on deep neural networks , 2015, INTERSPEECH.

[9] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[10] John Makhoul,et al. High-frequency regeneration in speech coding systems , 1979, ICASSP.

[11] Engin Erzin,et al. Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[12] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[13] Roar Hagen,et al. Spectral quantization of cepstral coefficients , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] W. Bastiaan Kleijn,et al. Avoiding over-estimation in bandwidth extension of telephony speech , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15] Patrick Bauer,et al. A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[16] Franz Pernkopf,et al. On representation learning for artificial bandwidth extension , 2015, INTERSPEECH.

[17] Franz Pernkopf,et al. Modeling speech with sum-product networks: Application to bandwidth extension , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Paavo Alku,et al. A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Chin-Hui Lee,et al. A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Patrick Bauer,et al. HMM-based artificial bandwidth extension supported by neural networks , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[21] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[22] Israel Cohen,et al. Evaluation of a Speech Bandwidth Extension Algorithm Based on Vocal Tract Shape Estimation , 2012, IWAENC.

[23] Peter Jax,et al. Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[24] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.