CORRECTIVE TRAINING FOR SPEAKER ADAPTATIONXiuyang Yu and Wayne WardCenter for Spoken Language UnderstandingUniversity of Colorado, Boulder, ColoradoABSTRACTThis paper reports results on an experiment to use correctivetraining techniques for rapid acoustic speaker adaptation in asemi-continuous speech recognition system. Decoder outputis used to adjust HMM acoustic models to improvediscrimination between correct words and near misses.Twenty sentences are used as an adaptation set. A speechrecognizer is run on each utterance to generate a wordlattice. The lattice is pruned relative to the correct path. Theforward-backward algorithm is used to align each path in thelattice against the speech input and compute observationcounts. For each input frame, counts in correct models areadjusted upward, and counts in incorrect models are adjusteddownward. The adjusted counts are normalized to generatenew observation probabilities for the models. Theparameters being adjusted are the mixture weights for thesemi-continuous HMMs. The technique reduced word errorfor a test subject by 37% relative.INTRODUCTIONSpeech recognition systems based on Hidden MarkovModels typically experience significant performancedegradation when a speaker is not represented well in thedata that the system was trained on. Rapid speakeradaptation techniques can be very effective in improvingperformance for a novel speaker. These techniques use asmall number of sentences (20-40), whose transcript isknown, to quickly adapt HMM acoustic models to a newspeaker. Such techniques are important, since in order forlonger term adaptation to be able to work, the system mustfunction well enough for the user to be productive. Thesetechniques attempt to correct very poor models quickly toget the system to a usable point for the user. This paperreports results on an experiment to use corrective trainingtechniques to rapidly adapt HMM acoustic models to apoorly recognized speaker.CORRECTIVE TRAININGCorrective training was introduced for speaker-dependentisolated word recognition by [1] and extended to speaker-independent continuous speech recognition by [2]. HiddenMarkov Models are normally trained according to amaximum likelihood criterion; parameters are adjusted tomaximize the probability of the training set. This processdoes not directly minimize the word errors, it does thisindirectly by attempting to assign a high probability to thecorrect utterance. It makes no representation of whatprobability is assigned to near misses. Corrective trainingseeks to directly minimize the number of word errors byadjusting parameters so as to improve discriminationbetween correct words and near misses. The general processis:1. Generate a set of near misses, words which areconfusable with the correct words.2. Align the correct words against the input.3. Align a near miss against the input.4. Modify model such that correct words are more likelyand incorrect ones less likely.5. Repeat steps 2-3 for other near misses.Speech recognizers are used to generate the near misses. In[1] and [2], a sp eech recognizer was used to generate an n-best list for an isolated word or sentence. This list was usedas the set of near misses. Both the correct utterance and nearmiss are aligned against the input. Model parameters arethen adjusted to make the correct words more likely and theincorrect ones less likely.SPHINX-II OBSERVATION ESTIMATESIn order to describe how corrective training is used to adaptour model, it is first necessary to describe the basic model.Our experimental system uses the Carnegie MellonUniversity Sphinx-II system for speech recognition[3],[4],[5]. Sphinx-II uses semi-continuous Hidden MarkovModels [3] to model context dependent phones. Likecontinuous HMM systems, semi-continuous systems use aweighted sum of points on Gaussian Probability DensityFunctions to estimate observation probabilities. Thedifference is that, while continuous systems estimate a set ofdistributions for each HMM state in the system, semi-
[1]
Mosur Ravishankar,et al.
Efficient Algorithms for Speech Recognition.
,
1996
.
[2]
Lalit R. Bahl,et al.
A new algorithm for the estimation of hidden Markov model parameters
,
1988,
ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[3]
Mei-Yuh Hwang,et al.
Shared-distribution hidden Markov models for speech recognition
,
1993,
IEEE Trans. Speech Audio Process..
[4]
Kai-Fu Lee,et al.
Corrective and reinforcement learning for speaker-independent continuous speech recognition
,
1989,
EUROSPEECH.
[5]
Xuedong Huang,et al.
Semi-continuous hidden Markov models for speech recognition
,
1989
.