Assessment of automatic speaker verification on lossy transcoded speech

In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.

[1]  Artur Janicki,et al.  Speaker Recognition from Coded Speech Using Support Vector Machines , 2011, TSD.

[2]  Fausto Pellandini,et al.  GSM speech coding and speaker recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Lou Boves,et al.  Speaker verification with GSM coded telephone speech , 1997, EUROSPEECH.

[4]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[5]  T.F. Quatieri,et al.  Speaker recognition from coded speech and the effects of score normalization , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[7]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[8]  Yun Lei,et al.  Improving robustness to compressed speech in speaker recognition , 2013, INTERSPEECH.

[9]  Piotr Staroniewicz INFLUENCE OF SPECIFIC VOIP TRANSMISSION CONDITIONS ON SPEAKER RECOGNITION PROBLEM , 2014 .

[10]  John S. D. Mason,et al.  Speaker verification performance with constrained durations , 2014, 2nd International Workshop on Biometrics and Forensics.

[11]  Sebastian Möller,et al.  Human speaker identification of known voices transmitted through different user interfaces and transmission channels , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Martin Neumann,et al.  Evaluation Results of Speaker Verification for VoIP Transmission with Packet Loss , 2014 .

[14]  Sebastian Möller,et al.  I-vector Speaker Verification for Speech Degraded by Narrowband and Wideband Channels , 2014, ITG Symposium on Speech Communication.

[15]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Aaron D. Lawson,et al.  Speaker recognition on lossy compressed speech using the speex codec , 2009, INTERSPEECH.