Previous work has shown that good accuracy improvements can be made for isolated word recognition using cepstral-time matrices as the speech feature instead of the more conventional MFCC-based speech feature augmented with higher order cep-strum. This work extends the performance improvements to UK English connected digit strings and to a sub-word based town names task. Experimental results are presented for a range different sized cepstral-time matrix widths-ranging from a stack width of 3 up to 13 MFCC frames. In addition a variety of columns are selected from the cepstral-time matrix for use as the final speech feature. Tests show that the optimal implementation of the cepstral-time matrix varies according to the specific recognition task. Finally the technique of linear discriminative analysis (LDA) is applied to cepstral-time matrices and is shown to successfully improve recognition performance, as well as reducing the size of the final speech feature. Three different implementations of LDA are described and are demonstrated on isolated digit and sub-word tasks.
[1]
Michael J. Carey,et al.
Estimating linear discriminant parameters for continuous density hidden Markov models
,
1994,
ICSLP.
[2]
Saeed Vaseghi,et al.
An analysis of cepstral-time matrices for noise and channel robust speech recognition
,
1995,
EUROSPEECH.
[3]
Brian Hanson,et al.
Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.
[4]
Ben P. Milner,et al.
Inclusion of temporal information into features for speech recognition
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[5]
Thomas W. Parsons,et al.
Voice and Speech Processing
,
1986
.