The affine transform and feature fusion for robust speaker identification in the presence of speech coding distortion

For security in wireless, voice over IP and cellular telephony applications, there is an emerging need for speaker identification systems (SID) to be robust to speech coding distortion. This paper examines the robustness issue for the 8 kilobits/second ITU-T G.729 codec. The SID system is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five features are considered and two approaches are used. Four of the five features are based on linear prediction analysis and the other is the mel frequency cepstrum. The first method is feature compensation based on the affine transform and is used to map the features from the test scenario to the train scenario. The second method is feature fusion based on the arithmetic combination of probabilities generated by the vector quantizer classifier. The affine transform and fusion of four features gives the best identification success rate (ISR) of 83.2%. The best performing single feature achieves an ISR of 70.5% without the affine transform and 77.4% with the affine transform.

[1]  Biing-Hwang Juang,et al.  Robustness of bit-stream based features for speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[3]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[4]  Alan McCree Reducing speech coding distortion for speaker identification , 2006, INTERSPEECH.

[5]  Richard J. Mammone,et al.  New LP-derived features for speaker identification , 1994, IEEE Trans. Speech Audio Process..

[6]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[7]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[8]  Richard J. Mammone,et al.  Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions , 1998, IEEE Trans. Speech Audio Process..