In this paper, we proposed a new set of speech feature parameters based on formant structure information. The speech signal is first divided into a gammatone filterbank, and then the Teager energy signal of each sub-band is extracted according to Teager Energy Operator. Two different energy separation algorithms, DESA-1 and DESA-2, are applied for obtain the instantaneous amplitude and frequency envelope of the formants, respectively. Finally, the feature vector is constructed by the amplitude and frequency information of the formants. The motivation of developing this new feature is that the formant location information is a quite distinct speech representation but seldom applied into speech recognition system before, and the conventional feature parameters such as mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) do not explicitly model spectral peak information which is very important clue to identify the different phones. A Mandarin digit string recognition task is performed for evaluating the performance of the proposed feature parameter. The recognition results show an improved speech recognition performance compared to the conventional MFCC and LPCC.
[1]
Lawrence R. Rabiner,et al.
Speech synthesis by rule: An acoustic domain approach
,
1968
.
[2]
Petros Maragos,et al.
Conditions for positivity of an energy operator
,
1994,
IEEE Trans. Signal Process..
[3]
H. M. Teager,et al.
Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract
,
1990
.
[4]
Petros Maragos,et al.
Energy separation in signal modulations with application to speech analysis
,
1993,
IEEE Trans. Signal Process..
[5]
H. Teager.
Some observations on oral air flow during phonation
,
1980
.
[6]
Markku Renfors,et al.
Time-Frequency Signal Analysis Using Teager Energy
,
1997
.
[7]
Malcolm Slaney,et al.
An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank
,
1997
.
[8]
Petros Maragos,et al.
AM-FM energy detection and separation in noise using multiband energy operators
,
1993,
IEEE Trans. Signal Process..
[9]
Petros Maragos,et al.
On amplitude and frequency demodulation using energy operators
,
1993,
IEEE Trans. Signal Process..