论文信息 - Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor

Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor

In order to realize multimodal speech recognition on a mobile phone, it is necessary to develop a small sensor which enables to measure lip movement with small calculation cost. In the previous study, we have developed a simple infrared lip movement sensor located on the front of mouth and cleared that the possibility of HMM based word recognition with 87.1% recognition rate. However, in practical use, it is difficult to set the sensor in front of mouth. In this paper, we developed a new lip movement sensor which can extract the lip movement from either side of a speaker's face and examine the performance. From experimental results, we have achieved 85.3% speaker independent word recognition rate only with the lip movement from the side sensor.

Seiichiro Hangai | Takahiro Yoshida | Erika Yamazaki

[1] Seiichiro Hangai,et al. Development of Infrared Lip Movement Sensor for Spoken Word Recognition , 2007 .

[2] Jian Zhang,et al. Real-time lip tracking for virtual lip implementation in virtual environments and computer games , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[3] Thomas S. Huang,et al. Real-time lip tracking and bimodal continuous speech recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[4] Xuedong Huang,et al. Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5] Paul Duchnowski,et al. Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6] Jing Huang,et al. Improving audio-visual speech recognition with an infrared headset , 2003, AVSP.

[7] Kuntal Sengupta,et al. HMM modeling for audio-visual speech recognition , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8] Sridha Sridharan,et al. Speech recognition in adverse environments using lip information , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[9] Sridha Sridharan,et al. The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10] Franck Luthon,et al. Real Time Tracking for 3D Realistic Lip Animation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11] Andrew Blake,et al. Accurate, real-time, unadorned lip tracking , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[12] Patrice Delmas,et al. Towards robust lip tracking , 2002, Object recognition supported by user interaction for service robots.

[13] Zicheng Liu,et al. Multi-sensory microphones for robust speech detection, enhancement and recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[15] Juergen Luettin,et al. Asynchronous stream modeling for large vocabulary audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).