Unsupervised speaker adaptation for robust speech recognition in real environments

In order to achieve high-precision speech recognition in real environments phone model adaptation procedures that can rapidly account for a wide range of different speakers and acoustic noise conditions are required. In this paper we propose an unsupervised speaker adaptation method that extends an unsupervised speaker and environment adaptation method based on sufficient statistics from HMMs by performing spectral subtraction and then adding a known noise to the input. Existing methods assume that a model is trained to match each of the different types of background noise that will be the object of recognition and do not consider variations in the signal-to-noise ratio or changes in the background noise for given inputs. In contrast, our method constrains the noise of the input data using an estimation of the noise spectra and then adds a known stable noise to the bleached noise that remains in the input, thereby smoothing out differences between background noises and enabling us to perform recognition with a single set of acoustic models. In addition, with regard to speaker adaptation, we select the set of closest speakers from our database on the basis of a single arbitrary utterance from the test speaker and retrain the acoustic models using the sufficient statistics of those speakers. By combining these two methods we are able to rapidly and accurately adapt to a new speaker. In recognition experiments with a signal-to-noise ratio of 20 dB and in a variety of noise conditions, the proposed method resulted in a recognition rate of 2 percent more than a speaker-independent model matched to the test noise environment for each noise environment, achieving an average recognition performance of 85.1 percent overall. In addition, we conducted a comparison of our method with a standard supervised adaptation technique: maximum likelihood linear regression (MLLR). © 2005 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 88(8): 30–41, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20199