Text-dependent speaker identification using neural network on distinctive Thai tone marks

Presents a neural network based text-dependent speaker identification system for Thai language. Linear prediction coefficients (LPC) are extracted from speech signal and formed feature vectors. These features are fed into a multilayer perceptron (MLP) neural network with backpropagation learning algorithm for training and identification processes. Five Thai tone marks are considered very closely in choosing the sentences in order to achieve the best speaker identification accuracy. Five speaking texts with each Thai tone and a mixed tone text are comparatively experimented. Average identification rate on 9 speakers achieves above 95% when using mixed tone text, and poor results occur with middle and low tone texts, which usually cause vagueness or unclear voices.

[1]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[2]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Yonghong Yan,et al.  Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[6]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[7]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[8]  Weicheng Shen,et al.  Prolog To Speaker Recognition: A Tutorial , 1997 .

[9]  Ali Zilouchian,et al.  FUNDAMENTALS OF NEURAL NETWORKS , 2001 .