Artificial bandwidth extension using deep neural networks for spectral envelope estimation

Many artificial speech bandwidth extension (ABE) approaches perform source-filter decomposition of the input narrowband speech, with subsequent computation of upper frequency band (UB) spectral envelope posteriors. In this paper we perform a direct comparison of HMM- and deep neural network (DNN)-based modeling of likelihoods or posteriors for ABE UB envelope estimation. DNN-based approaches turn out to significantly exceed GMM-based ones in speech quality. Further analysis reveals that this is not due to a better shape of the estimated UB spectral envelope, but primarily due to a much better estimate of the energy ratio of the upper band vs. the lower band - an important result with significant impact on ABE speech quality particularly for fricative sounds.

[1]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[2]  Peter Kabal,et al.  Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech , 2011, INTERSPEECH.

[3]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[4]  Peter J. Patrick Enhancement of band-limited speech signals , 1983 .

[5]  Tim Fingscheidt,et al.  Reference-free SNR Measurement for Narrowband and Wideband Speech Signals in Car Noise , 2012, ITG Conference on Speech Communication.

[6]  Paavo Alku,et al.  Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Bin Liu,et al.  A novel method of artificial bandwidth extension using deep architecture , 2015, INTERSPEECH.

[8]  Shenghui Zhao,et al.  Speech bandwidth expansion based on deep neural networks , 2015, INTERSPEECH.

[9]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[10]  John Makhoul,et al.  High-frequency regeneration in speech coding systems , 1979, ICASSP.

[11]  Engin Erzin,et al.  Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[12]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[13]  Roar Hagen,et al.  Spectral quantization of cepstral coefficients , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  W. Bastiaan Kleijn,et al.  Avoiding over-estimation in bandwidth extension of telephony speech , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Patrick Bauer,et al.  A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[16]  Franz Pernkopf,et al.  On representation learning for artificial bandwidth extension , 2015, INTERSPEECH.

[17]  Franz Pernkopf,et al.  Modeling speech with sum-product networks: Application to bandwidth extension , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Paavo Alku,et al.  A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Chin-Hui Lee,et al.  A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Patrick Bauer,et al.  HMM-based artificial bandwidth extension supported by neural networks , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[21]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[22]  Israel Cohen,et al.  Evaluation of a Speech Bandwidth Extension Algorithm Based on Vocal Tract Shape Estimation , 2012, IWAENC.

[23]  Peter Jax,et al.  Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[24]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.