Lip reading system using novel Japanese visemes classification and hierarchical weighted discrimination

In recent years, automatic lip reading based on `visemes' have been studied by researchers for realizing human-machine interactive communication system in many applications. However there are a lot of problems such as the definition of the number of viseme classes, discrimination method of visemes, speech recognition method based on visemes, and so on. In this paper, a novel classification of Japanese visemes and hierarchical weighted discrimination method for speech recognition are proposed to address these problems. We augmented the classification number of visemes from 6(conventional) to 9 to represent the words in more detailed by visemes. In addition, considering the difficulty in discriminating with increase of the number of visemes, the hierarchical weighted discrimination method is proposed. For the purpose of comparing with the conventional method, the ATR phonetically balanced word group, which is large vocabulary and includes various visemes, was used and applied to word recognition experiments. From these results, we confirmed the proposed method worked well.

[1]  Christian Abry,et al.  Nineteen (±two) French visemes for visual speech synthesis , 1990, SSW.

[2]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Anton Nijholt,et al.  Classifying Visemes for Automatic Lipreading , 1999, TSD.

[4]  Toyoshiro Nakashima,et al.  The Codification of Distinctive Mouth Shapes and the Expression Method of Data Concerning Changes in Mouth Shape when Uttering Japanese , 2009 .

[5]  Takeshi Saitoh,et al.  Keyframe Extraction from Utterance Scene and Keyframe-based Word Lip Reading , 2011 .

[6]  Stephen J. Cox,et al.  Nonlinear scale decomposition based features for visual speech recognition , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[7]  Takeshi Saitoh,et al.  Analysis of efficient lip reading method for various languages , 2008, 2008 19th International Conference on Pattern Recognition.

[8]  Juergen Luettin,et al.  Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..

[9]  A. Murat Tekalp,et al.  Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[10]  Saitoh Takeshi,et al.  Lip Reading based on Contour Information of Profile Image , 2008 .

[11]  Paul F. Whelan,et al.  A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition , 2010, IPSJ Trans. Comput. Vis. Appl..

[12]  Nozomu Hamada,et al.  Speaker dependent visual word recognition by using sequential mouth shape codes , 2012, 2012 International Symposium on Intelligent Signal Processing and Communications Systems.

[13]  Hui Zhao,et al.  Visual speech synthesis based on Chinese dynamic visemes , 2008, 2008 International Conference on Information and Automation.

[14]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[15]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).