Text-Independent Speaker Verification from Mixed Speech of Multiple Speakers via Using Pole Distribution of Speech Signals

This paper presents a method of text-independent speaker verification from mixed speech of multiple speakers via using pole distribution of speech signals. The poles of speech signal derived from all-pole speech production model are obtained via a neural net called bagging CAN2 (competitive associative net 2) for learning efficient piecewise linear approximation of nonlinear function. We show an analysis that poles of mixed speech are expected to be composed of the poles farther from zeros of ARMA (autoregressive moving average) models of constituent speeches. By means of experiments using unmixed and mixed speeches, we show the distribution of the poles of speeches has two typical regions: one involves poles which change suddenly with the change of the speech from unmixed to mixed, and the other involves poles which change continuously with the change of the mixing weight, which is considered to support the analysis. We execute experiments of speaker verification, and obtain the following properties of recall and precision as measures of verification performance: the recall decreases suddenly with the change of the speech from unmixed to mixed, while the precision does not decreases so much with the decrease of SNR (signal to noise ratio) until below 0 dB. Finally, we show the usefulness of the present method.

[1]  Shuichi Kurogi,et al.  Probabilistic Prediction for Text-Prompted Speaker Verification Capable of Accepting Spoken Words with the Same Meaning but Different Pronunciations , 2016, ICONIP.

[2]  A. Bronkhorst The cocktail-party problem revisited: early processing and selection of multi-talker speech , 2015, Attention, Perception, & Psychophysics.

[3]  Shuichi Kurogi,et al.  An Analysis of Speaker Recognition Using Bagging CAN2 and Pole Distribution of Speech Signals , 2010, ICONIP.

[4]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[5]  Shuichi Kurogi,et al.  Speaker Recognition Using Pole Distribution of Speech Signals Obtained by Bagging CAN2 , 2009, ICONIP.

[6]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[7]  Rabah W. Aldhaheri,et al.  Robust Text-independent Speaker Recognition with Short Utterance in Noisy Environment Using SVD as a Matching Measure , 2004, J. King Saud Univ. Comput. Inf. Sci..

[8]  Shuichi Kurogi,et al.  Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI , 2016, ICONIP.

[9]  N. Nedachi,et al.  Reproduction and Recognition of Vowels Using Competitive Associative Nets , 2006, 2006 SICE-ICASE International Joint Conference.

[10]  Wensheng Sun,et al.  Multi-speaker Recognition in Cocktail Party Problem , 2017, CSPS.

[11]  Homayoon Beigi,et al.  Fundamentals of Speaker Recognition , 2011 .