Vowel Recognition using Neural Networks

Summary Speech recognition techniques have been developed dramatically in recent years. Nevertheless, errors caused by environmental noise are still a serious problem in recognition. Employing algorithms to detect and follow the motion of lips have been widely used to improve the performance of speech recognition algorithms. This paper presents a novel technique to recognize vowels. Lip features extracted by using a combined method are used as input parameters to a neural network system for recognition. Accuracy of the proposed method is verified by using it to recognize 6 main Farsi vowels.

[1]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[2]  Jean-Luc Schwartz,et al.  Comparing models for audiovisual fusion in a noisy-vowel recognition task , 1999, IEEE Trans. Speech Audio Process..

[3]  Wang Rui,et al.  Recognition of sequence lip images and its application , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[4]  Eun-Jung Holden,et al.  Lip Tracking using Pattern Matching Snakes , 2002 .

[5]  Terrence J. Sejnowski,et al.  Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.

[6]  Trent W. Lewis,et al.  Lip Feature Extraction Using Red Exclusion , 2000, VIP.

[7]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[8]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Kazunori Sugahara,et al.  Vowel recognition according to lip shapes by using neural network , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[11]  Satoshi Nakamura,et al.  Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.

[12]  Fionn Murtagh,et al.  Cluster Dissection and Analysis: Theory, Fortran Programs, Examples. , 1986 .

[13]  Makoto Amamiya,et al.  Online speech-reading system for Japanese language , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..