Improving robustness to compressed speech in speaker recognition

The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determine whether robustness to noise generalizes to audio compression. Using a speaker recognition system based on i-vectors and probabilistic linear discriminant analysis (PLDA), we compared four PLDA training scenarios. The first involves training PLDA on clean data, the second included additional noisy and reverberant speech, a third introduces transcoded data matched to the evaluation conditions and the fourth, using codec-degraded speech mismatched to the evaluation conditions. We found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions. Noise-robust features offered a degree of robustness to compressed speech while more significant improvements occurred when PLDA had observed the codec matching the evaluation conditions. Finally, we tested i-vector fusion from the different features, which increased overall system performance but did not improve robustness to codec-degraded speech. Index Terms: speaker recognition, speech coding, codec degradation, speaker verification.

[1]  Arindam Mandal,et al.  Normalized amplitude modulation features for large vocabulary noise-robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Lukás Burget,et al.  iVector Fusion of Prosodic and Cepstral Features for Speaker Verification , 2011, INTERSPEECH.

[3]  Richard M. Stern,et al.  Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Lou Boves,et al.  Speaker verification with GSM coded telephone speech , 1997, EUROSPEECH.

[5]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[6]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Richard M. Stern,et al.  Robust speech recognition using a Small Power Boosting algorithm , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Fausto Pellandini,et al.  Influence of GSM speech coding on the performance of text-independent speaker recognition , 2000, 2000 10th European Signal Processing Conference.

[11]  Thomas E. Tremain,et al.  The federal standard 1016 4800 bps CELP voice coder , 1991, Digit. Signal Process..

[12]  Mark Phythian,et al.  Effects of speech coding on text-dependent speaker recognition , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[13]  T.F. Quatieri,et al.  Speaker recognition from coded speech and the effects of score normalization , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[14]  Douglas A. Reynolds,et al.  Speaker recognition using G.729 speech codec parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).