Speech enhancement based on analysis-synthesis framework with improved pitch estimation and spectral envelope enhancement

This paper presents a speech enhancement approach based on analysis-synthesis framework. An improved multi-band summary correlogram (MBSC) algorithm is proposed for pitch estimation and voiced/unvoiced (V/UV) detection. The proposed pitch detection algorithm achieves a lower pitch detection error compared with the reference algorithm. The denoising autoencoder (DAE) is applied to enhance the line spectrum frequencies (LSFs). The reconstruction loss could be decreased compare with the swallow model. The proposed approach is evaluated using the perceptual evaluation of speech quality (PESQ) and the experimental results show that the proposed approach improves the performance of speech enhancement compared with the conventional speech enhancement approach. In addition, it could be applied to parametric speech coding even at low bit rate and low SNR environments.

[1]  Abeer Alwan,et al.  Multi-band summary correlogram-based pitch detection for noisy speech , 2013, Speech Commun..

[2]  Thippur V. Sreenivas,et al.  Codebook constrained Wiener filtering for speech enhancement , 1996, IEEE Trans. Speech Audio Process..

[3]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[4]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[5]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Abeer Alwan,et al.  Noise-robust F0 estimation using SNR-weighted summary correlograms from multi-band comb filters , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Harald Höge,et al.  Evaluation of Pitch Detection Algorithms in Adverse Conditions , 2006 .

[8]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[9]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[10]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[11]  Jean Rouat,et al.  A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..

[12]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[13]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Abeer Alwan,et al.  Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[16]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[19]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[20]  Hing-Cheung So,et al.  Model-Based Speech Enhancement With Improved Spectral Envelope Estimation via Dynamics Tracking , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[22]  John S. Collura,et al.  MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Hing-Cheung So,et al.  Noise suppression based on an analysis-synthesis approach , 2010, 2010 18th European Signal Processing Conference.