Non-linear encoding of the excitation source using neural networks for transition mode coding in CELP

When a frame suffers erasure, the adaptive codebook at the decoder is no longer in sync with the one at the encoder. When the frame that is erased is a frame following the voice-onset frame, this loss of synchronization of the codebooks severely degrades the quality of the decoded speech. This degradation is primarily because no meaningful excitation signal is present in the adaptive codebook. In this paper, an autoassociative neural network (AANN) with a compression layer is used to capture the characteristics of the excitation source around the GCIs. A transition mode frame that differs from the conventional CELP frame without altering the bit-rate is proposed to deal with this problem of frame drops during transition regions. In this transition mode frames, the compressed representation of the excitation source around the GCIs obtained through AANNs is used to reconstruct the adaptive codebook at the receiver. It is shown that the proposed method improves the quality of the decoded speech.

[1]  Wai C. Chu,et al.  Speech Coding Algorithms , 2003 .

[2]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Milan Jelinek,et al.  Glottal-Shape Codebook to Improve Robustness of CELP Codecs , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Engin Erzin Shaped fixed codebook search for CELP coding at low bit rates , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Wai C. Chu,et al.  Speech Coding Algorithms: Foundation and Evolution of Standardized Coders , 2003 .

[7]  K. Sreenivasa Rao,et al.  Voice conversion by mapping the speaker-specific features using pitch synchronous approach , 2010, Comput. Speech Lang..