Thai Text-Dependent Speaker Identification by ANN with Two Different Time Normalization Techniques

A text-dependent speaker identification system for Thai language was proposed. Thai isolated digits 09 and their concatenations were used for speaking text. Well-known artificial neural network (ANN) called multilayer perceptron (MLP) with backpropagation learning algorithm was conducted for recognition engine due to its simplicity and less processing time spending. Because of fix number of input neurons of MLP, time normalization algorithms must be applied to speech signal in order to obtain a unique number of input speech features. Two different time normalization algorithms, which are linear interpolation and synchronized overlap and add (SOLA) were implemented and compared. Experimental results showed that different algorithms of time normalization clearly effected system performance. SOLA, which can carry more original sound than linear interpolation, gave better identification rate in all speaking digits.

[1]  J. Oglesby,et al.  Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[2]  D. O'Shaughnessy,et al.  Linear predictive coding , 1988, IEEE Potentials.

[3]  Chai Wutiwiwatchai,et al.  Text-dependent speaker identification using LPC and DTW for Thai language , 1999, Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99. 'Multimedia Technology for Asia-Pacific Information Infrastructure' (Cat. No.99CH37030).

[4]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Zhang Lihe,et al.  Automatic speaker verification using the neural network and combined LPC parameters , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[7]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[8]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[9]  A. Nejat Ince,et al.  Digital Speech Processing , 1992 .

[10]  Chai Wutiwiwatchai,et al.  Text-dependent speaker identification using neural network on distinctive Thai tone marks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[11]  Weicheng Shen,et al.  Prolog To Speaker Recognition: A Tutorial , 1997 .

[12]  Michael J. Carey,et al.  Robust prosodic features for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[14]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[15]  Robert I. Damper,et al.  Comparison of multilayer and radial basis function neural networks for text-dependent speaker recognition , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[16]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.