One popular feature type in speech recognition is based on linear transformations of sequences of cepstral feature vectors. In general the transformation is generated in two steps: first a transformation like linear discriminant analysis (LDA) or heteroscedastic linear discriminant analysis (HLDA) is used to maximize separation between classes and reduce the dimensionality, followed by a decorrelating transformation. Here we investigate the weighting of classes when using the LDA transformation. In particular we are concerned with the special status of silence, for which the data can be arbitrarily long, and which can be represented by more than one silence/noise model. The special case of our acoustic models for commercial applications, which consist of several sub-models for each type of application, like general English, digits, names, alphabet, etc., creates a conflict when using a transformation like LDA to improve the separability of states which correspond to the same phoneme, but used within a different type of task. We also evaluate replacing sample counts with error/accuracy counts and cross-task LDA transformation estimation. The results show that it is important to take these conditions into account and demonstrate accuracy/speed improvements when appropriate care is taken in computing the LDA transformations.
[1]
Andrej Ljolje,et al.
Low Latency Real-Time Vocal Tract Length Normalization
,
2004,
TSD.
[2]
Andrej Ljolje.
Multiple task-domain acoustic models
,
2003,
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[3]
George Saon,et al.
Maximum likelihood discriminant feature spaces
,
2000,
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[4]
Mark J. F. Gales,et al.
Semi-tied covariance matrices for hidden Markov models
,
1999,
IEEE Trans. Speech Audio Process..
[5]
Andreas G. Andreou,et al.
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition
,
1998,
Speech Commun..
[6]
Mark J. F. Gales,et al.
Maximum likelihood linear transformations for HMM-based speech recognition
,
1998,
Comput. Speech Lang..