Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, may be even more effective. One major challenge is to map the feature-space eigenvoices back to the observation space so that the state observation likelihoods can be computed during the estimation of eigenvoice weights and subsequent decoding. Our solution is to compute kernel PCA using composite kernels, and we will call our new method kernel eigenvoice speaker adaptation. On the TIDIGITS corpus, we found that compared with a speaker-independent model, our kernel eigenvoice adaptation method can reduce the word error rate by 28–33% while the standard eigenvoice approach can only match the performance of the speaker-independent model.
[1]
R. G. Leonard,et al.
A database for speaker-independent digit recognition
,
1984,
ICASSP.
[2]
Roland Kuhn,et al.
Rapid speaker adaptation in eigenvoice space
,
2000,
IEEE Trans. Speech Audio Process..
[3]
Bernhard Schölkopf,et al.
Nonlinear Component Analysis as a Kernel Eigenvalue Problem
,
1998,
Neural Computation.
[4]
Marc G. Genton,et al.
Classes of Kernels for Machine Learning: A Statistics Perspective
,
2002,
J. Mach. Learn. Res..
[5]
Alexander J. Smola,et al.
Learning with kernels
,
1998
.
[6]
Bernhard Schölkopf,et al.
Sparse Kernel Feature Analysis
,
2002
.
[7]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.