A Universal Phoneme-Set Based Language Independent Short Utterance Speaker Recognition

In the field of speaker recognition, short utterance speaker recognition (SUSR) has been attracting more and more attention in recent years. Despite the advancement in this technology and use of phonetic cues for speaker recognition, the role of individual phonemes in carrying speaker information is yet quite an open issue. This paper presents a novel idea of using phoneme classes as a basis for SUSR. For the present work, we have restricted ourselves to vowel classes and defined combined vowel classes in two languages, i.e. English and Chinese. These sets are used to develop the universal background phoneme-class model (UBPM) and then for training and testing over conventional GMM-UBM systems. Experimental results have proved that speech segments, as short as phonemes, are surprisingly important areas that carry useful speaker information.

[1]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  John S. D. Mason,et al.  Short utterance-based video aided speaker recognition , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[5]  Shrikanth S. Narayanan,et al.  Robust speaker identification based on selective use of feature vectors , 2007, Pattern Recognit. Lett..

[6]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[7]  S.A.H. Shah,et al.  Speaker recognition using lower formants , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[8]  S. R. Mahadeva Prasanna,et al.  Multiple frame size and rate analysis for speaker recognition under limited data condition , 2009 .

[9]  Joseph P. Campbell,et al.  Phonetic speaker recognition , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[10]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[11]  Wai Nang Chan,et al.  Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.