论文信息 - Data augmentation and language model adaptation

Data augmentation and language model adaptation

A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram language model (LM) This method is based on numerical distances in a reduced space obtained by singular value decomposition. Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a word error rate (WER) reduction of 6.5%. By further interpolating augmented counts with the counts extracted from a very large newspaper corpus, but only for selected histories, a total WER reduction of 11.7% was obtained. We show that this approach gives better results than a global count interpolation for all histories of the LM.

[1] Stefan Besling,et al. Language model speaker adaptation , 1995, EUROSPEECH.

[2] Jerome R. Bellegarda. Multi-Span statistical language modeling for large vocabulary speech recognition , 1998, ICSLP.

[3] R. De Mori,et al. New Language Model Adaptation Algorithm Based on the Definition of Cardinal Distance , 2000 .

[4] Michael W. Berry,et al. Large-Scale Sparse Singular Value Computations , 1992 .

[5] Ronald Rosenfeld,et al. Using story topics for language model adaptation , 1997, EUROSPEECH.

[6] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7] Ronald Rosenfeld,et al. Topic adaptation for language modeling using unnormalized exponential models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[9] Philippe Bretier,et al. Effective human-computer cooperative spoken dialogue: the AGS demonstrator , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10] Marcello Federico,et al. Language Model Adaptation , 1999 .