Defining the controlling parameter in constrained discriminative linear transform for supervised speaker adaptation

Constrained discriminative linear transform (CDLT) optimized with Extended Baum-Welch (EBW) has been presented in the literature as a discriminative speaker adaptation method that outperforms the conventional maximum likelihood algorithm. Defining the controlling parameter of EBW to achieve the best performance of speaker adaptation, however, still remains an open question. This paper presents an empirical study on this issue. Results of our experiment suggest that a log-linear relationship exists between the optimal controlling parameter and the amount of data. This relationship can be used to efficiently define the controlling parameter for each test speaker to improve CDLT performance. We also discuss the possibility of generalizing the log-linear rule to a wider range of learning problems because such knowledge can substantially reduce the computation effort for parameter tuning.

[1]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[2]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[3]  Philip C. Woodland,et al.  Discriminative adaptive training using the MPE criterion , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[4]  Lan Wang,et al.  MPE-based discriminative linear transform for speaker adaptation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.