Comparative Study of Several Novel Acoustic Features for Speaker Recognition

Finding good features that represent speaker identity is an important problem in speaker recognition area. Recently a number of novel acoustic features have been proposed for speaker recognition. The researchers use different data sets and sometimes different classifiers to evaluate the features and compare them to the baselines such as MFCC or LPCC. However, due to different experimental conditions direct comparison of those features to each other is difficult or impossible. This paper presents a study of five new recently proposed acoustic features using the same data (NIST 2001 SRE), and the same UBM-GMM classifier. The results are presented as DET curves with equal error ratios indicated. Also, an SVM-based combination of GMM scores produced on different features has been made to determine if the new features carry any complimentary information. The results for different features as well as for their combinations are directly comparable to each other and to those obtained with the baseline MFCC features.

[1]  Nengheng Zheng,et al.  Using Haar transformed vocal source information for automatic speaker recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Rosângela Coelho,et al.  Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Jinfang Wang,et al.  Speaker Recognition Using Features Derived from Fractional Fourier Transform , 2005, AutoID.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[6]  Hugo Cordeiro,et al.  Speaker Characterization with MLSFs , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[7]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.