Feature analysis for automatic speechreading

Audio-visual automatic speech recognition systems use visual information to enhance ASR systems in clean and noisy environments. This paper investigates a number of different visual feature extraction methods. It was observed that when performing visual speech recognition the visual feature vector requires a base level of detail for improved recognition. Geometric feature extraction provides lower recognition than pixel based methods due to the loss of characteristic speech information such as protrusion etc. Downsampling of images reduces visual recognition scores due to the loss of detail in the images. Also, the role of dynamic features was investigated for improved recognition. It was observed that static features alone outperform a combination of both static and dynamic features when restricting the dimension of the feature vector e.g. 50. This illustrates that the need for a certain level of detail in visual speech recognition is a higher priority than dynamic information. Once this base level of detail is attained the dynamic features should then be able to improve the recognition rate.

[1]  Joseph Picone,et al.  Linear discriminant analysis for signal processing problems , 1999, Proceedings IEEE Southeastcon'99. Technology on the Brink of 2000 (Cat. No.99CH36300).

[2]  A. Adjoudani,et al.  On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .

[3]  Gerasimos Potamianos,et al.  Speaker independent audio-visual database for bimodal ASR , 1997, AVSP.

[4]  Dominic W. Massaro,et al.  Speechreading: illusion or window into pattern recognition , 1999, Trends in Cognitive Sciences.

[5]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[8]  A. Ganapathiraju,et al.  LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL , 1995 .

[9]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[10]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[11]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[12]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[13]  Jean-Luc Schwartz,et al.  Comparing models for audiovisual fusion in a noisy-vowel recognition task , 1999, IEEE Trans. Speech Audio Process..

[14]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.