Efficient class-based language modelling for very large vocabularies

Investigates the perplexity and word error rate performance of two different forms of class model and the respective data-driven algorithms for obtaining automatic word classifications. The computational complexity of the algorithm for the 'conventional' two-sided class model is found to be unsuitable for very large vocabularies (>100k) or large numbers of classes (>2000). A one-sided class model is therefore investigated and the complexity of its algorithm is found to be substantially less in such situations. Perplexity results are reported on both English and Russian data. For the latter both 65k and 430k vocabularies are used. Lattice rescoring experiments are also performed on an English language broadcast news task. These experimental results show that both models, when interpolated with a word model, perform similarly well. Moreover, classifications are obtained for the one-sided model in a fraction of the time required by the two-sided model, especially for very large vocabularies.

[1]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[2]  Thomas Hain,et al.  The 1997 HTK broadcast news transcription system , 1998 .

[3]  Joshua Goodman,et al.  Putting it all together: language model combination , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Philip C. Woodland,et al.  Comparison of language modelling techniques for Russian and English , 1998, ICSLP.

[5]  Thomas Niesler,et al.  Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..