HMM-based artificial bandwidth extension supported by neural networks

In telephony applications, artificial bandwidth extension (ABE) can be applied to narrowband (NB) calls for speech quality and intelligibility enhancement. However, high-band extension is challenging due to insufficient mutual information between the lower and upper frequency band in speech. Estimation errors particularly of fricatives /s, z/ are the consequence leading to annoying artifacts, such as lisping. In this paper, two neural networks are employed to support an HMM-based ABE: The first one detects /s, z/ phonemes to assist the estimation process, while the second one corrects the estimated high-band energy. In an absolute category rating test the proposed ABE attains a significantly improved speech quality vs. NB speech. This is confirmed by a comparison category rating test pointing out a speech quality gain of 1.0 CMOS points over NB speech.

[1]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[2]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Yannis Stylianou,et al.  Combined estimation/coding of highband spectral envelopes for speech spectrum expansion , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Peter Vary,et al.  Measurement, analysis and simulation of wind noise signals for mobile communication devices , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[5]  Peter J. Patrick Enhancement of band-limited speech signals , 1983 .

[6]  Tim Fingscheidt,et al.  Reference-free SNR Measurement for Narrowband and Wideband Speech Signals in Car Noise , 2012, ITG Conference on Speech Communication.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[9]  Henning Puder,et al.  On Improving Telephone Speech Intelligibility for Hearing Impaired Persons , 2012, ITG Conference on Speech Communication.

[10]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  T. van Waterschoot,et al.  A quantitative comparison of blind C50 estimators , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[12]  Gerhard Schmidt,et al.  Bandwidth Extension of Speech Signals , 2008, Lecture Notes in Electrical Engineering.

[13]  Patrick Bauer,et al.  On improving speech intelligibility in automotive hands-free systems , 2010, IEEE International Symposium on Consumer Electronics (ISCE 2010).

[14]  Hannu Pulakka,et al.  Development and evaluation of artificial bandwidth extension methods for narrowband telephone speech , 2013 .

[15]  Martin T. Hagan,et al.  Neural network design , 1995 .

[16]  Patrick Bauer,et al.  Impact of hearing impairment on fricative intelligibility for artificially bandwidth-extended telephone speech in noise , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[18]  W. Bastiaan Kleijn,et al.  Avoiding over-estimation in bandwidth extension of telephony speech , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Paavo Alku,et al.  Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Patrick Bauer,et al.  A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[22]  Patrick Bauer,et al.  On speech quality assessment of artificial bandwidth extension , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).