Vocal tract normalization based on spectral warping

Two techniques for speaker adaptation based on frequency scale modifications are described and evaluated. In one method, minimum mean square error matching is performed between a spectral template for each speaker to a "typical speaker" spectral template. One parameter, a warping factor, is used to control the spectral matching. In the second method, a neural network classifier is used to adjust the frequency warping factor for each speaker so as to maximize vowel classification performance for each speaker. A vowel classifier trained only with normalized female speech and tested only with normalized male speech, or vice versa, is nearly as accurate as when speaker genders are matched for training and testing, and the speech is not normalized. The improvement due to normalization is much smaller, if training and test data are matched. The normalization based on classification performance is superior to that based on minimizing mean square error.