An approach based on phonemes to large vocabulary Chinese sign language recognition

Hitherto, the major challenge to sign language recognition is how to develop approaches that scale well with increasing vocabulary size. We present an approach to large vocabulary, continuous Chinese sign language (CSL) recognition that uses phonemes instead of whole signs as the basic units. Since the number of phonemes is limited, HMM-based training and recognition of the CSL signal becomes more tractable and has the potential to recognize enlarged vocabularies. Furthermore, the proposed method facilitates the CSL recognition when the finger-alphabet is blended with gestures. About 2400 phonemes are defined for CSL. One HMM is built for each phoneme, and then the signs are encoded based on these phonemes. A decoder that uses a tree-structured network is presented. Clustering of the Gaussians on the states, the language model and N-best-pass is used to improve the performance of the system. Experiments on a 5119 sign vocabulary are carried out, and the result is exciting.

[1]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Dimitris N. Metaxas,et al.  Adapting hidden Markov models for ASL recognition by using three-dimensional computer vision methods , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[4]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[5]  KwangYun Wohn,et al.  Recognition of space-time hand-gestures using hidden Markov model , 1996, VRST.

[6]  Ming Ouhyoung,et al.  A real-time continuous gesture recognition system for sign language , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[7]  A E Marble,et al.  Image processing system for interpreting motion in American Sign Language. , 1992, Journal of biomedical engineering.

[8]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[9]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[10]  Wen Gao,et al.  Sign Language Recognition Based on HMM/ANN/DP , 2000, Int. J. Pattern Recognit. Artif. Intell..

[11]  Geoffrey E. Hinton,et al.  Glove-TalkII: Mapping Hand Gestures to Speech Using Neural Networks , 1994, NIPS.

[12]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[14]  Kirsti Grobel,et al.  Isolated sign language recognition using hidden Markov models , 1996, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[15]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[16]  Klaus A J Riederer 1 LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 2000 .