A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization

In this paper, we first show that accounting for Jacobian in Vocal-Tract Length Normalization (VTLN) will degrade the performance when there is a mismatch between the train and test speaker conditions. VTLN is implemented using our recently proposed approach of linear transformation of conventional MFCC, i.e. a feature transformation. In this case, Jacobian is simply the determinant of the linear transformation. Feature transformation is equivalent to the means and covariances of the model being transformed by the inverse transformation while leaving the data unchanged. Using a set of adaptation experiments, we analyze the reasons for the degradation during Jacobian compensation and conclude that applying the same VTLN transformation on both means and variances does not fully match the data when there is a mismatch in the speaker conditions. This may have similar implications for constrainedMLLR in mismatched speaker conditions. We then propose to use covariance adaptation on top of VTLN to account for the covariance mismatch between the train and the test speakers and show that accounting for Jacobian after covariance adaptation improves the performance.