Voice source waveforms for utterance level speaker identification using support vector machines

The voice source waveform generated by the periodic motion of the vocal folds during voiced speech remains to be fully utilised in automatic speaker recognition systems. We perform closed-set speaker identification experiments on the YOHO speech corpus with the aim of continuing our investigation into the level of speaker discriminatory information present in a data driven parameterisation of the voice-source waveform obtained by closed-phase inverse filtering. Discriminatory modelling using support-vector-machines resulted in utterance level correct identification rates of 85.3% when using a multi-class model, and 72.5% when using a binary, one-against-all regression model, each on cohorts of 20 speakers respectively. These results compare well with other speaker identification experiments in the literature employing features derived from the voice source waveform, and are positive when observed under the hypothesis that they should be complementary to the common magnitude spectral parameters (mel-cepstra).

[1]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[2]  Geoffroy Querol Speaker recognition evaluation: selective approaches and fusion , 2007 .

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Thierry Dutoit,et al.  A comparative study of glottal source estimation techniques , 2019, Comput. Speech Lang..

[5]  Thierry Dutoit,et al.  The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Geoffrey Stewart Morrison,et al.  Forensic voice comparison and the paradigm shift. , 2009, Science & justice : journal of the Forensic Science Society.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[9]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[11]  David Vandyke,et al.  Speaker Identification Using Glottal-Source Waveforms and Support-Vector-Machine Modelling , 2012 .

[12]  Thierry Dutoit,et al.  A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis , 2009, INTERSPEECH.

[13]  Mike Brookes,et al.  Voice source cepstrum coefficients for speaker identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Daniel P. W. Ellis,et al.  Data-driven voice source waveform analysis and synthesis , 2012, Speech Commun..

[15]  Thierry Dutoit,et al.  On the potential of glottal signatures for speaker recognition , 2010, INTERSPEECH.

[16]  J. Markel Digital inverse filtering-a new tool for formant trajectory estimation , 1972 .

[17]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[18]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[19]  Joseph P. Campbell Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[21]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[22]  Patrick A. Naylor,et al.  Data-driven voice soruce waveform modelling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.