Emotion Recognition Using Prosodic Features

In computer vision, a feature is a set of measurements. Each measurement contains a piece of information and specifies the property or characteristics of the object. In speech recognition techniques, how the speech signals are produced and perceived by the human is starting point of the research. Human speech communication produces ideas (word sequence) which are made within the speaker brain. These word sequence are delivered by his/her text generator. The general human vocal system is modeled by the speech generator. The speech generator converts the word sequence into speech signal and is transferred to listener through air. At the listener side, the human auditory system receives these acoustic signal and listeners brain starts the processing of signal to understand its content. The speech recognizer modeled by the speech decoder, it decodes the acoustic signal into word sequence. So speech production and speech perception are in inverse processes in the speech recognition application.