A Survey on Different Visual Speech Recognition Techniques

In automatic speech recognition (ASR) visual speech information plays a pivotal role especially in the presence of acoustic noise. This paper provides a short review of the different methods for visual speech recognition systems (VSR). Here, we discuss the different stages of VSR including the face and lip localization techniques and different visual feature extraction techniques. We also provide the details of audio-visual database related to this study.

[1]  Darryl Stewart,et al.  Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos , 2008, EURASIP J. Image Video Process..

[2]  Andrzej Czyzewski,et al.  An audio-visual corpus for multimodal automatic speech recognition , 2017, Journal of Intelligent Information Systems.

[3]  Suprava Patnaik,et al.  Comparison of classifiers for lip reading with CUAVE and TULIPS database , 2015 .

[4]  Timothy J. Hazen Visual model structures and synchrony constraints for audio-visual speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  S. Palanivel,et al.  Lip reading of hearing impaired persons using HMM , 2011, Expert Syst. Appl..

[6]  Suprava Patnaik,et al.  A novel lip reading algorithm by using localized ACM and HMM: Tested for digit recognition , 2014 .

[7]  Mahesh Chandra,et al.  Multiple cameras audio visual speech recognition using active appearance model visual features in car environment , 2016, Int. J. Speech Technol..

[8]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[9]  Ahmad Basheer Hassanat,et al.  Visual Words for Automatic Lip-Reading , 2014, ArXiv.

[10]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Naomi Harte,et al.  TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.

[12]  Andrzej Czyzewski,et al.  A comparative study of English viseme recognition methods and algorithms , 2017, Multimedia Tools and Applications.

[13]  Matti Pietikäinen,et al.  A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..

[14]  Omar Farooq,et al.  Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition , 2015 .

[15]  Juergen Luettin,et al.  Audio-Visual Speech Modelling for Continuous Speech Recognition , 2000 .

[16]  M. Z. Ibrahim,et al.  Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping , 2015, J. Vis. Commun. Image Represent..

[17]  Kuntal Sengupta,et al.  Lip geometric features for human-computer interaction using bimodal speech recognition: comparison and analysis , 2004, Speech Commun..

[18]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..