Speaker identification using dynamic time warping with stress compensation technique

We present an algorithm for an isolated-word text-dependent speaker identification under normal and four stressful styles. The styles are: shout, slow, loud, and soft which are designed to simulate speech produced under real stressful conditions. The algorithm is based on dynamic time warping (DTW) with a cepstral stress compensation technique. Comparing DTW combined with cepstral stress compensation, with DTW without cepstral stress compensation, the recognition rate has improved to some extent with a little increase in the computations. The recognition rate is improved: from 33% to 67% in shout style, from 51% to 84% in slow style, from 40% to 80% in loud style, and from 52% to 70% in soft style. The cepstral coefficients and transitional coefficients are combined to form an observation vector for dynamic time warping. This algorithm is tested on a limited number of speakers due to our limited data base.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  John H. L. Hansen,et al.  Classification of speech under stress using target driven features , 1996, Speech Commun..

[3]  John H. L. Hansen,et al.  Improved speech recognition via speaker stress directed classification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  Sadaoki Furui Speaker-dependent-feature extraction, recognition and processing techniques , 1991, Speech Commun..

[6]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .