Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum

The limited audio bandwidth used in narrowband telephone systems degrades both the quality and the intelligibility of speech. This paper presents a new method for the bandwidth extension of telephone speech. Frequency components are added to the frequency band 4-8 kHz using only the information in the narrowband speech. A neural network is used to estimate the mel spectrum in the extension band in short time frames based on features calculated from the narrowband speech. A wideband excitation signal is generated by spectral folding from the narrowband linear prediction residual and a filter bank is utilized to divide the excitation into four sub-bands that cover the extension band. These sub-bands are weighted such that the estimated mel spectrum is realized. Bandwidth-extended speech is obtained by summing the weighted sub-bands and the original narrowband signal. Listening tests show that this new method improves speech quality compared with narrowband telephone speech and with a previously published bandwidth extension method.

[1]  Peter Vary,et al.  Backwards Compatible Wideband Telephony in Mobile Networks: CELP Watermarking and Bandwidth Extension , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[3]  Alan McCree,et al.  A 14 kb/s wideband speech coder with a parametric highband model , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Paavo Alku,et al.  Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Andreas Johannes Gerrits,et al.  Hi-BIN: an alternative approach to wideband speech coding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Hong-Goo Kang,et al.  Speech Bandwidth Extension Using Temporal Envelope Modeling , 2008, IEEE Signal Processing Letters.

[7]  Paavo Alku,et al.  Bandwidth extension of telephone speech using a filter bank implementation for highband MEL spectrum , 2010, 2010 18th European Signal Processing Conference.

[8]  Ulrich Kornagel,et al.  Techniques for artificial bandwidth extension of telephone speech , 2006, Signal Process..

[9]  Roch Lefebvre,et al.  The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[10]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[11]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[12]  Martin Etnestad Johansen,et al.  Bandwidth Extension of Telephony Speech , 2009 .

[13]  Yannis Stylianou,et al.  Combined estimation/coding of highband spectral envelopes for speech spectrum expansion , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[15]  Roch Lefebvre,et al.  A New Post-Filtering for Artificially Replicated High-Band in Speech Coders , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Peter Jax,et al.  Feature selection for improved bandwidth extension of speech signals , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Peter Vary,et al.  Artificial bandwidth extension without side information for ITU-t g.729.1 , 2007, INTERSPEECH.

[18]  Geun-Bae Song,et al.  A study of HMM-based bandwidth extension of speech signals , 2009, Signal Process..

[19]  Peter Kabal,et al.  Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech , 2007, INTERSPEECH.

[20]  John Makhoul,et al.  High-frequency regeneration in speech coding systems , 1979, ICASSP.

[21]  Patrick Bauer,et al.  An HMM-based artificial bandwidth extension evaluated by cross-language training and test , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[23]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[24]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[25]  Peter Jax,et al.  Artificial bandwidth extension of speech supported by watermark-transmitted side information , 2005, INTERSPEECH.

[26]  Peter Kabal,et al.  Combining equalization and estimation for bandwidth extension of narrowband speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Alan McCree,et al.  A robust narrowband to wideband extension system featuring enhanced codebook mapping , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  Gerhard Schmidt,et al.  Bandwidth Extension of Speech Signals , 2008, Lecture Notes in Electrical Engineering.

[29]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[30]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[31]  J. W. Paulus,et al.  Variable Bitrate Wideband Speech Coding Using Perceptually Motivated Thresholds , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[32]  Meir Tzur,et al.  Speech reconstruction from mel frequency cepstral coefficients and pitch frequency , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  Paavo Alku,et al.  Artificial bandwidth expansion method to improve intelligibility and quality of AMR-coded narrowband speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[34]  Peter Jax,et al.  An upper bound on the quality of artificial bandwidth extension of narrowband speech signals , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Peter Jax,et al.  Bandwidth Extension for Hierarchical Speech and Audio Coding in ITU-T Rec. G.729.1 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Paavo Alku,et al.  The effect of highband harmonic structure in the artificial bandwidth expansion of telephone speech , 2007, INTERSPEECH.

[37]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[39]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[40]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[41]  Akira Nishimura Steganographic band width extension for the AMR codec of low-bit-rate modes , 2009, INTERSPEECH.

[42]  Paavo Alku,et al.  Development, evaluation and implementation of an artificial bandwidth extension method of telephone speech in mobile terminal , 2009, IEEE Transactions on Consumer Electronics.

[43]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.