Combining Novel Acoustic Features using SVM to Detect Speaker Changing Points

Automatic speaker change point detection separates different speakers from continuous speech signal by utilising the speaker characteristics. It is often a necessary step before using a speaker recognition system. Acoustic features of the speech signal such as Mel Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC) are commonly used to represent a speaker. However, the features are affected by speech content, environment, type of recording device, etc. So far, no features have been discovered, which values depend only on the speaker. In this paper four novel feature types proposed in recent journals and conference papers for speaker verification problem, are applied to the problem of speaker change point detection. The features are also used to form a combination scheme using an SVM classifier. The results shows that the proposed scheme improves the performance of speaker changing point detection as compared to the system that uses MFCC features only. Some of the novel features of low dimensionality give comparable speaker change point detection accuracy to the high-dimensional MFCC features.

[1]  Rosângela Coelho,et al.  Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[3]  Hugo Cordeiro,et al.  Speaker Characterization with MLSFs , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[4]  Masafumi Nishida,et al.  Unsupervised speaker indexing using speaker model selection based on Bayesian information criterion , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  R.W. Schafer,et al.  From frequency to quefrency: a history of the cepstrum , 2004, IEEE Signal Processing Magazine.

[6]  Patrice Abry,et al.  A Wavelet-Based Joint Estimator of the Parameters of Long-Range Dependence , 1999, IEEE Trans. Inf. Theory.

[7]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[8]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[9]  Nengheng Zheng,et al.  Using Haar transformed vocal source information for automatic speaker recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.