Maximum entropy PLDA for robust speaker recognition under speech coding distortion

The system combining i-vector and probabilistic linear discriminant analysis (PLDA) has been applied with great success in the speaker recognition task. The i-vector space gives a low-dimensional representation of a speech segment and training data of a PLDA model, which offers greater robustness under different conditions. In this paper, we propose a new framework based on i-vector/PLDA and Maximum Entropy (ME) to improve the performance of speaker identification system in the presence of speech coding distortion. The results are reported on TIMIT database and speech coding obtained by passing the speech test from TIMIT database through the AMR encoder/decoder. Our results show that the proposed methode achieves improved performance when compared with the i-vector/PLDA and MEGMM.

[1]  Sridha Sridharan,et al.  PLDA based speaker recognition on short utterances , 2012, Odyssey.

[2]  Jos Uffink,et al.  The constraint rule of the maximum entropy principle , 1996 .

[3]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[4]  Ahmed Krobba,et al.  Evaluation of Speaker Identification System using GSMEFR speech Data , 2010, 5th International Conference on Design & Technology of Integrated Systems in Nanoscale Era.

[5]  Yun Lei,et al.  Improving robustness to compressed speech in speaker recognition , 2013, INTERSPEECH.

[6]  Hema A. Murthy,et al.  Approaches to Codec Independent Speaker Identification in Voip Speech , 2018, 2018 Twenty Fourth National Conference on Communications (NCC).

[7]  K. Sreenivasa Rao,et al.  Speech Processing in Mobile Environments , 2014, Springer Briefs in Electrical and Computer Engineering.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Douglas A. Reynolds,et al.  Speaker recognition using G.729 speech codec parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[12]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[13]  T.F. Quatieri,et al.  Speaker recognition from coded speech and the effects of score normalization , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[14]  Driss Matrouf,et al.  Exploring some limits of Gaussian PLDA modeling for i-vector distributions , 2014, Odyssey.

[15]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  J.D. Gibson,et al.  Speech coding methods, standards, and applications , 2005, IEEE Circuits and Systems Magazine.

[17]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[18]  Sebastian Möller,et al.  I-vector Speaker Verification for Speech Degraded by Narrowband and Wideband Channels , 2014, ITG Symposium on Speech Communication.

[19]  Jun Zhang,et al.  Investigation of Sampling Techniques for Maximum Entropy Language Modeling Training , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Jozef Polacky,et al.  Assessment of automatic speaker verification on lossy transcoded speech , 2016, 2016 4th International Conference on Biometrics and Forensics (IWBF).

[21]  R. V. Pawar,et al.  Speaker Identification using Neural Networks , 2007, IEC.

[22]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Antonio M. Peinado Speech Recognition Over Digital Channels: Robustness and Standards , 2006 .

[24]  Mark Phythian,et al.  Effects of speech coding on text-dependent speaker recognition , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[25]  Fausto Pellandini,et al.  Influence of GSM speech coding on the performance of text-independent speaker recognition , 2000, 2000 10th European Signal Processing Conference.

[26]  Bin Ma,et al.  Maximum Entropy Based Data Selection for Speaker Recognition , 2011, INTERSPEECH.

[27]  K. Sreenivasa Rao,et al.  Improved speaker identification in wireless environment , 2013 .

[28]  K. Sreenivasa Rao,et al.  Effect of speech coding on speaker identification , 2010, 2010 Annual IEEE India Conference (INDICON).

[29]  Man-Wai Mak,et al.  Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Alan McCree Reducing speech coding distortion for speaker identification , 2006, INTERSPEECH.

[31]  Laura Fernández Gallardo Human and Automatic Speaker Recognition over Telecommunication Channels , 2015 .

[32]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[33]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[34]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.