Review of a framework for audiovisual dialog-based in human computer interaction

This paper gives a review about a practical system that aims to detect user intent to speak to a computer. The system is based on recognized speech from both audio and visual information to be contextual information, thus improving the human-like communication between users and computers. It employs an adaptive module to select an appropriate grammar that suits the program. Furthermore, the system utilizes the visual modality in addition to audio, for increasing word accuracy.