论文信息 - Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum

Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum

The limited audio bandwidth used in narrowband telephone systems degrades both the quality and the intelligibility of speech. This paper presents a new method for the bandwidth extension of telephone speech. Frequency components are added to the frequency band 4-8 kHz using only the information in the narrowband speech. A neural network is used to estimate the mel spectrum in the extension band in short time frames based on features calculated from the narrowband speech. A wideband excitation signal is generated by spectral folding from the narrowband linear prediction residual and a filter bank is utilized to divide the excitation into four sub-bands that cover the extension band. These sub-bands are weighted such that the estimated mel spectrum is realized. Bandwidth-extended speech is obtained by summing the weighted sub-bands and the original narrowband signal. Listening tests show that this new method improves speech quality compared with narrowband telephone speech and with a previously published bandwidth extension method.

Paavo Alku | Hannu Pulakka | P. Alku | Hannu Pulakka

[1] Peter Vary,et al. Backwards Compatible Wideband Telephony in Mobile Networks: CELP Watermarking and Bandwidth Extension , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[3] Alan McCree,et al. A 14 kb/s wideband speech coder with a parametric highband model , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4] Paavo Alku,et al. Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Andreas Johannes Gerrits,et al. Hi-BIN: an alternative approach to wideband speech coding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6] Hong-Goo Kang,et al. Speech Bandwidth Extension Using Temporal Envelope Modeling , 2008, IEEE Signal Processing Letters.

[7] Paavo Alku,et al. Bandwidth extension of telephone speech using a filter bank implementation for highband MEL spectrum , 2010, 2010 18th European Signal Processing Conference.

[8] Ulrich Kornagel,et al. Techniques for artificial bandwidth extension of telephone speech , 2006, Signal Process..

[9] Roch Lefebvre,et al. The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[10] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[11] Douglas D. O'Shaughnessy,et al. Speech communication : human and machine , 1987 .

[12] Martin Etnestad Johansen,et al. Bandwidth Extension of Telephony Speech , 2009 .

[13] Yannis Stylianou,et al. Combined estimation/coding of highband spectral envelopes for speech spectrum expansion , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] LarrañagaPedro,et al. A review of feature selection techniques in bioinformatics , 2007 .

[15] Roch Lefebvre,et al. A New Post-Filtering for Artificially Replicated High-Band in Speech Coders , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16] Peter Jax,et al. Feature selection for improved bandwidth extension of speech signals , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Peter Vary,et al. Artificial bandwidth extension without side information for ITU-t g.729.1 , 2007, INTERSPEECH.

[18] Geun-Bae Song,et al. A study of HMM-based bandwidth extension of speech signals , 2009, Signal Process..

[19] Peter Kabal,et al. Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech , 2007, INTERSPEECH.

[20] John Makhoul,et al. High-frequency regeneration in speech coding systems , 1979, ICASSP.

[21] Patrick Bauer,et al. An HMM-based artificial bandwidth extension evaluated by cross-language training and test , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Allen Gersho,et al. Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[23] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[24] Pedro Larrañaga,et al. A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[25] Peter Jax,et al. Artificial bandwidth extension of speech supported by watermark-transmitted side information , 2005, INTERSPEECH.

[26] Peter Kabal,et al. Combining equalization and estimation for bandwidth extension of narrowband speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] Alan McCree,et al. A robust narrowband to wideband extension system featuring enhanced codebook mapping , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28] Gerhard Schmidt,et al. Bandwidth Extension of Speech Signals , 2008, Lecture Notes in Electrical Engineering.

[29] Joseph Picone,et al. Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[30] Peter Jax,et al. On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[31] J. W. Paulus,et al. Variable Bitrate Wideband Speech Coding Using Perceptually Motivated Thresholds , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[32] Meir Tzur,et al. Speech reconstruction from mel frequency cepstral coefficients and pitch frequency , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33] Paavo Alku,et al. Artificial bandwidth expansion method to improve intelligibility and quality of AMR-coded narrowband speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[34] Peter Jax,et al. An upper bound on the quality of artificial bandwidth extension of narrowband speech signals , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35] Peter Jax,et al. Bandwidth Extension for Hierarchical Speech and Audio Coding in ITU-T Rec. G.729.1 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[36] Paavo Alku,et al. The effect of highband harmonic structure in the artificial bandwidth expansion of telephone speech , 2007, INTERSPEECH.

[37] W. Bastiaan Kleijn,et al. Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[39] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[40] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .

[41] Akira Nishimura. Steganographic band width extension for the AMR codec of low-bit-rate modes , 2009, INTERSPEECH.

[42] Paavo Alku,et al. Development, evaluation and implementation of an artificial bandwidth extension method of telephone speech in mobile terminal , 2009, IEEE Transactions on Consumer Electronics.

[43] Paavo Alku,et al. Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.