Speaker adaptation for word‐based speech recognition systems

This work is aimed at enhancing the speaker‐independent performance of word‐based speech recognition systems by rapidly and automatically deducing general characteristics of the current speaker and using them to derive speaker‐normalizing transforms. DP matching is used to align and compare corresponding frames of the incoming speech and reference vocabulary. A single transform is then computed for all voiced speech and another for all unvoiced speech. The transform consist of a linear filtering component and, optionally, a constrained frequency shift. Experiments have been carried out with twenty male and female, native and non‐native English speakers each producing 150 digits. Adaptation on all 150 digits reduces recognition errors by a factor of three (4.5% to 1.5%). With adaptation on just three randomly selected digits, the reduction factor is two. Frequency shifting is useful only when the amount of adaptation material is large and the reference speech is not exclusively from the same sex as the cur...