Speaker recognition based on telephone quality short Polish sequences with removed silence

This paper presents the effectiveness of speaker identification based on short Polish sequences. An impact of automatic removal of silence on the speaker recognition accuracy is considered. Several methods to detect the beginnings and ends of the voice signal have been used. Experimental research was carried out in Matlab environment with the use of a specially prepared database of short speech sequences in Polish. The construction of speaker models was realized with two techniques: Vector Quantization (VQ) and Gaussian Mixture Models (GMM). We also tested the influence of the sampling rate reduction on the speaker recognition performance. Streszczenie: Artykul przedstawia badania efektywności rozpoznawania mowcy opartego na krotkich wypowiedziach w jezyku polskim. Sprawdzono wplyw automatycznego wykrywania i usuwania ciszy na jakośc rozpoznawania mowcy. Przebadano kilka roznych metod wykrywania początku i konca fragmentow mowy w wypowiadanych sekwencjach. Eksperymenty zostaly przeprowadzone z uzyciem środowiska Matlab i specjalnie utworzonej bazy krotkich wypowiedzi w jezyku polskim. Do budowy modeli mowcow wykorzystano kwantyzacja wektorowa (VQ) oraz Gaussian Mixture Models (GMM). Podczas badan sprawdzono takze wplyw obnizenia szybkości probkowania na skutecznośc identyfikacji mowcy.

[1]  Radoslaw Weychan,et al.  Speaker recognition based on short polish sequences , 2010, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010.

[2]  M. Kudinov Comparison of some algorithms for endpoint detection for speech recognition device used in cars , 2011, 2011 International Siberian Conference on Control and Communications (SIBCON).

[3]  Radoslaw Weychan,et al.  Influence of silence removal on speaker recognition based on short Polish sequences , 2011, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2011.

[4]  R. San-Segundo,et al.  Robust speech detection for noisy environments , 2011, IEEE Aerospace and Electronic Systems Magazine.

[5]  Samy Bengio,et al.  Automatic Speech and Speaker Recognition , 2009 .

[6]  Venu Govindaraju,et al.  Advances in Biometrics: Sensors, Algorithms and Systems , 2007 .

[7]  Sadegh Rezaei,et al.  Change Point Detection in GARCH Models for Voice Activity Detection , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[9]  A. Dabrowski,et al.  Subband wavelet signal denoising for voice activity detection , 2008, New Trends in Audio and Video / Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2008.