Acceleration of sequence kernel computation for real-time speaker identification

The sequence kernel has been shown to be a promising kernel function for learning from sequential data such as speech and DNA. However, it is not scalable to massive datasets due to its high computational cost. In this paper, we propose a method of approximating the sequence kernel that is shown to be computationally very efficient. More specifically, we formulate the problem of approximating the sequence kernel as the problem of obtaining a pre-image in a reproducing kernel Hilbert space. The effectiveness of the proposed approximation is demonstrated in text-independent speaker identification experiments with 10 male speakers—our approach provides significant reduction in computation time with limited performance degradation. Based on the proposed method, we develop a real-time kernel-based speaker identification system using Virtual Studio Technology (VST).

[1]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[2]  Samy Bengio,et al.  A kernel trick for sequences applied to text-independent speaker verification systems , 2007, Pattern Recognit..

[3]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Masashi Sugiyama,et al.  Covariate shift adaptation for semi-supervised speaker identification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.