A deep neural network approach to speech bandwidth expansion

We propose a deep neural network (DNN) approach to speech bandwidth expansion (BWE) by estimating the spectral mapping function from narrowband (4 kHz in bandwidth) to wideband (8 kHz in bandwidth). Log-spectrum power is used as the input and output features to perform the required nonlinear transformation, and DNNs are trained to realize this high-dimensional mapping function. When evaluating the proposed approach on a large-scale 10-hour test set, we found that the DNN-expanded speech signals give excellent objective quality measures in terms of segmental signal-to-noise ratio and log-spectral distortion when compared with conventional BWE based on Gaussian mixture models (GMMs). Subjective listening tests also give a 69% preference score for DNN-expanded speech over 31% for GMM when the phase information is assumed known. For tests in real operation when the phase information is imaged from the given narrowband signal the preference comparison goes up to 84% versus 16%. A correct phase recovery can further increase the BWE performance for the proposed DNN method.

[1]  Jun Du,et al.  A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.

[2]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[3]  Ulrich Kornagel,et al.  Spectral widening of telephone speech using an extended classification approach , 2002, 2002 11th European Signal Processing Conference.

[4]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[6]  Yoshihisa Nakatoh,et al.  Generation of broadband speech from narrowband speech based on linear mapping , 2002 .

[7]  Gautham J. Mysore,et al.  Language informed bandwidth expansion , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[8]  Mark A. Clements,et al.  Sparse probabilistic state mapping and its application to speech bandwidth expansion , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Samy A. Mahmoud,et al.  Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding , 1993, IEEE Trans. Speech Audio Process..

[10]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[11]  Gerhard Schmidt,et al.  Neural networks versus codebooks in an application for bandwidth extension of speech signals , 2003, INTERSPEECH.

[12]  Geun-Bae Song,et al.  A study of HMM-based bandwidth extension of speech signals , 2009, Signal Process..

[13]  Jacob Benesty,et al.  Spectral Enhancement Methods , 2009 .

[14]  Gerhard Schmidt,et al.  5 Bandwidth Extension of Telephony Speech , .

[15]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[16]  Frank K. Soong,et al.  A maximum a Posterior-based reconstruction approach to speech bandwidth expansion in noise , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[18]  Yoshihisa Nakatoh,et al.  Generation of broadband speech from narrowband speech using piecewise linear mapping , 1997, EUROSPEECH.

[19]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[20]  Peter Jax,et al.  Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden Markov model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Julien Epps,et al.  A new technique for wideband enhancement of coded narrowband speech , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[22]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[23]  Qin Yan,et al.  Speech Bandwidth Extension: Extrapolations of Spectral Envelop and Harmonicity Quality of Excitation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Biing-Hwang Juang,et al.  Optimal quantization of LSP parameters , 1993, IEEE Trans. Speech Audio Process..

[25]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[26]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.