Optimal state dependent spectral representation for HMM modeling : a new theoretical framework

Acoustic speech signal modeling systems are generally formed of two stages. In the first one, an analysis module extracts from the speech signal a sequence of feature vectors that describes the speech in a time-frequency space. ''Mel Frequency based Cepstral Coefficients'' (MFCC) are a popular feature set. In the second stage, stochastic modeling of the feature sequences is performed, generally using ''Hidden Markov Models'' (HMM) [8]. In order to compute the MFCC coefficients a spectral analysis with a filterbank defined on a MEL scale is first performed, then the logarithm operator is applied on the filterbank energies followed by a cosine transform. MEL frequency scale, a psycho-acoustic scale, is characterized with a higher resolution in the low frequency bands with respect to the high frequency bands. Besides the psycho-acoustic characteristics, increasing the frequency resolution in the low