Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor

In order to realize multimodal speech recognition on a mobile phone, it is necessary to develop a small sensor which enables to measure lip movement with small calculation cost. In the previous study, we have developed a simple infrared lip movement sensor located on the front of mouth and cleared that the possibility of HMM based word recognition with 87.1% recognition rate. However, in practical use, it is difficult to set the sensor in front of mouth. In this paper, we developed a new lip movement sensor which can extract the lip movement from either side of a speaker's face and examine the performance. From experimental results, we have achieved 85.3% speaker independent word recognition rate only with the lip movement from the side sensor.

[1]  Seiichiro Hangai,et al.  Development of Infrared Lip Movement Sensor for Spoken Word Recognition , 2007 .

[2]  Jian Zhang,et al.  Real-time lip tracking for virtual lip implementation in virtual environments and computer games , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[3]  Thomas S. Huang,et al.  Real-time lip tracking and bimodal continuous speech recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[4]  Xuedong Huang,et al.  Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  Paul Duchnowski,et al.  Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Jing Huang,et al.  Improving audio-visual speech recognition with an infrared headset , 2003, AVSP.

[7]  Kuntal Sengupta,et al.  HMM modeling for audio-visual speech recognition , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8]  Sridha Sridharan,et al.  Speech recognition in adverse environments using lip information , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[9]  Sridha Sridharan,et al.  The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Franck Luthon,et al.  Real Time Tracking for 3D Realistic Lip Animation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Andrew Blake,et al.  Accurate, real-time, unadorned lip tracking , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[12]  Patrice Delmas,et al.  Towards robust lip tracking , 2002, Object recognition supported by user interaction for service robots.

[13]  Zicheng Liu,et al.  Multi-sensory microphones for robust speech detection, enhancement and recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[15]  Juergen Luettin,et al.  Asynchronous stream modeling for large vocabulary audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).