A survey of automatic lip reading approaches

Current era is to make the interaction between humans and their artificial partners (Computers) and make communication easier and more reliable. One of the actual tasks is the use of vocal interaction. Speech recognition may be improved by visual information of human face. In literature, the lip shape and its movement are referred to as lip reading. Lip reading computing plays a vital role in automatic speech recognition and is an important step towards accurate and robust speech recognition. In this paper we consider our approach towards automatic lip reading in detail by using Active Appearance Model (AAM) and Hidden Markov Model (HMM). Appearance-based methods consider the raw image for feature extraction. AAM uses for detection of speaker's face features and feature points from faces automatically by appearance parameters of a speaker's lips. Hidden Markov Model (HMM) approach is used for Lip recognition. In this paper, we will describe the visual features based on shape and appearance descriptions and detailed information about the automatic lip reading system. Individual characteristics and efficiency of both approaches towards the Lip reading is also discuss. The efficiency of both approaches (AAM and HMM) will be evaluated individually by characteristics and their performances are then compared and discuss. In this paper we will show the recognition process of common visual features (i.e. selection of lip and extraction of movement) for an improved lip reading.

[1]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Aggelos K. Katsaggelos,et al.  Frame Rate and Viseme Analysis for Multimedia Applications to Assist Speechreading , 1998, J. VLSI Signal Process..

[4]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[5]  Aggelos K. Katsaggelos,et al.  Frame Rate and Viseme Analysis for Multimedia Applications to Assist Speechreading , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[6]  Christian Jutten,et al.  Two novel visual voice activity detectors based on appearance models and retinal filtering , 2007, 2007 15th European Signal Processing Conference.

[7]  L. Lombardi,et al.  COMPARISONS OF VISUAL FEATURES EXTRACTION TOWARDS AUTOMATIC LIP READING , 2013 .

[8]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ahmad B. A. Hassanat,et al.  Visual Speech Recognition , 2011, ArXiv.

[10]  J L Schwartz,et al.  Audio-visual enhancement of speech in noise. , 2001, The Journal of the Acoustical Society of America.

[11]  I. Matthews,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[12]  Christian Jutten,et al.  A study of lip movements during spontaneous dialog and its application to voice activity detection. , 2009, The Journal of the Acoustical Society of America.

[13]  Juergen Luettin,et al.  Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  Jing Xu,et al.  Lip Detection and Tracking Using Variance Based Haar-Like Features and Kalman filter , 2010, 2010 Fifth International Conference on Frontier of Computer Science and Technology.