Performance comparisons of all-pass transform adaptation with maximum likelihood linear regression

All-pass transform (APT) adaptation transforms the cepstral means of a hidden Markov model so as to mimic the effect of warping the short-time frequency axis of a segment of speech, much like vocal tract length normalization (VTLN). However, APT adaptation can be implemented as a linear transformation in the cepstral domain, much like the better known maximum likelihood linear regression (MLLR). Recent work demonstrated the superior performance of APT adaptation to MLLR for a large vocabulary conversational speech recognition task. This work presents similar comparisons on the switchboard corpus. We found that without VTLN, the best MLLR and APT systems achieved word error rates (WERs) of 43.0% and 40.2% respectively. Similarly, with VTLN the respective error rates were 40.3%, and 39.2%, so that APT adaptation is significantly better in both cases. We also undertook a set of experiments to determine whether APT adaptation can be combined with a linear semi-tied covariance (STC) transform. With a single APT per speaker, the application of STC reduced the WER from 42.9% to 39.4%.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[3]  Alexander H. Waibel,et al.  Speaker adaptation with all-pass transforms , 2004, Speech Commun..

[4]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[5]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[6]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[7]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Philip C. Woodland,et al.  Improvements in linear transform based speaker adaptation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  William J. Byrne,et al.  Speaker adaptation with all-pass transforms , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).