Speaker adaptive phoneme recognition based on feature mapping from spectral domain to probabilistic domain

A feature parameter space for speech recognition called PRPG (probability ratios between phoneme group pairs) is described, and speaker adaptive phoneme recognition is performed. In the coordinate system proposed, the area with the same information for speech recognition is compressed into one point. The mapping function from spectral coordinate system to the proposed one is realized using a neural network. The code-vectors designed on this coordinate system are guaranteed to be information-theoretically more efficient than that of spectral coordinate system. Moreover, by the definition of the coordinate system, the meaning of axes is equivalent among different speakers, so speaker adaptation can be easily performed without trajectory mapping. Experimental results show that errors are reduced by 40% by coordinate conversion in speaker-dependent tasks. The scores of speaker-adaptive tasks in the proposed feature domain are always superior to those of the speaker-dependent tasks in the spectral domain.<<ETX>>

[1]  Mari Ostendorf,et al.  Joint quantizer design and parameter estimation for discrete hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Shigeru Katagiri,et al.  Construction of a large-scale Japanese speech database and its management system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Richard M. Schwartz,et al.  Improved speaker adaption using text dependent spectral mappings , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5]  Satoshi Nakamura,et al.  Speaker adaptation applied to HMM and neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[7]  Satoshi Nakamura,et al.  A comparative study of spectral mapping for speaker adaptation , 1990, International Conference on Acoustics, Speech, and Signal Processing.