A New Segmentation Algorithm Combined with Transient Frames Power for Text Independent Speaker Verification

In this paper we propose a new segmentation algorithm called delta MFCC based speech segmentation (DMFCC-SS), with application to speaker recognition systems. We show that DMFCC-SS can separate the regions of speech that result from similar likelihood scores using models such as a Gaussian mixture model (GMM), and can therefore be used to identify the regions of speech between two transitional states in a speech signal. By combining this segmentation algorithm with the discriminative power of transient frames in speaker recognition, we can investigate the tradeoff in speed-up rates that result from DMFCC-SS, with speaker verification equal error rates that result from representatives of each segment. We use a universal background model Gaussian mixture model (UBM-GMM) as a baseline system. The proposed speed-up algorithm, working in the pre-processing stage, performs well while having no computational load compared to the main GMM system. Experimental results show the superior performance of this pre-processing method in comparison with other algorithms working in the pre-processing stage of a UBM-GMM system.

[1]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[4]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jérôme Louradour,et al.  Segmentation and relevance measure for speaker verification , 2004, INTERSPEECH.

[6]  Alexander I. Rudnicky,et al.  Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems , 2004, INTERSPEECH.

[7]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  Jérôme Louradour,et al.  Discriminative power of transient frames in speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..