Emotional expressions in audiovisual human computer interaction

Visual and auditory modalities are two of the most commonly used media in interactions between humans. The authors describe a system to continuously monitor the user's voice and facial motions for recognizing emotional expressions. Such an ability is crucial for intelligent computers that take on a social role such as an actor or a companion. We outline methods to extract audio and visual features useful for classifying emotions. Audio and visual information must be handled appropriately in single-modal and bimodal situations. We report audio-only and video-only emotion recognition on the same subjects, in person-dependent and person-independent fashions, and outline methods to handle bimodal recognition.