论文信息 - Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension

Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension

In this paper, we propose a subband-based ensemble of sequential deep neural networks (DNNs) for bandwidth extension (BWE). First, the narrow-band spectra are folded into the high-band (HB) region to generate the high-band spectra, and then the energy levels of the HB spectra are adjusted using the DNN-based on the log-power spectra feature. For this, we basically build the multiple DNNs, which is responsible for each subband of the HB and the DNN ensemble is sequentially connected from lower to higher subbands. This sequential structure for the DNN ensemble carries out the denoising and HB regression to better estimate the HB energy levels. In addition, we use the voiced/unvoiced (V/UV) classification to differently apply the DNN ensemble depending on either V/UV sounds. To demonstrate the performance of the proposed BWE algorithm, we compare it with a speech production model-based BWE system and a DNN-based BWE system in which the log-power spectra in the HB are estimated directly. The experimental results show that the proposed approach provides better speech quality than conventional approaches.

[1] Peter Jax,et al. On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[2] Roch Lefebvre,et al. The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[3] Eunmi Oh,et al. Blind Bandwidth Extension System Utilizing Advanced Spectral Envelope Predictor , 2015 .

[4] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Xiao-Lei Zhang,et al. Deep Belief Networks Based Voice Activity Detection , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7] Chin-Hui Lee,et al. A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Yaxing Li,et al. Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation , 2016, IET Signal Process..

[9] Peter Jax,et al. Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden Markov model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10] Yongqiang Wang,et al. An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Tara N. Sainath,et al. Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Paavo Alku,et al. Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Ulrich Kornagel. Improved artificial low-pass extension of telephone speech , 2003 .

[14] Paavo Alku,et al. Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15] Aruna Bayya,et al. Objective measures for speech quality assessment in wireless communications , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16] Paavo Alku,et al. Speech bandwidth extension using Gaussian mixture model-based estimation of the highband mel spectrum , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] John Makhoul,et al. High-frequency regeneration in speech coding systems , 1979, ICASSP.

[18] Lei Miao,et al. Enhanced AMR-WB bandwidth extension in 3GPP EVS codec , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[19] Yaxing Li,et al. Robust Artificial Bandwidth Extension Technique Using Enhanced Parameter Estimation , 2014 .

[20] Joon-Hyuk Chang,et al. Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21] Tim Fingscheidt,et al. Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22] Shenghui Zhao,et al. Using conditional restricted Boltzmann machines for spectral envelope modeling in speech bandwidth extension , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24] Alan McCree,et al. A robust narrowband to wideband extension system featuring enhanced codebook mapping , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25] Paavo Alku,et al. The effect of highband harmonic structure in the artificial bandwidth expansion of telephone speech , 2007, INTERSPEECH.

[26] Peter Jax,et al. Bandwidth extension of speech signals: a catalyst for the introduction of wideband speech coding? , 2006, IEEE Communications Magazine.

[27] Y. Kosta,et al. Artificial Bandwidth Extension of Speech & Its Applications in Wireless Communication Systems: A Review , 2012, 2012 International Conference on Communication Systems and Network Technologies.

[28] Yan Song,et al. Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29] Yi Hu,et al. Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30] Yamato Ohtani,et al. GMM-based bandwidth extension using sub-band basis spectrum model , 2014, INTERSPEECH.

[31] Jean-Christophe Valière,et al. Low-band extension of telephone-band speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[32] Joon-Hyuk Chang,et al. Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection , 2016, Comput. Speech Lang..

[33] Zhen-Hua Ling,et al. Restoring high frequency spectral envelopes using neural networks for speech bandwidth extension , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[34] Kari Jarvinen. Standardisation of the adaptive multi-rate codec , 2000, 2000 10th European Signal Processing Conference.

[35] Francesco Piazza,et al. Frequency recovery of narrow-band speech using adaptive spline neural networks , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[36] Hiroshi Yasukawa. Enhancement of telephone speech quality by simple spectrum extrapolation method , 1995, EUROSPEECH.