Text, Speech and Dialogue

The talk will concern several ideas that combat the sparse data problem of language modeling. All alleviate it, neither solves it. These ideas are: equivalence classification of histories, positional clustering (different cluster systems for different n-gram positions), use of linguistic classes (e.g., Wordnet), class constraints in maximum entropy estimation, random forests, and neural network classification. An interesting problem that must be faced is as follows: words that are sparse and need to be classified do not have sufficient statistics to indicate their appropriate class membership. V. Matoušek and P. Mautner (Eds.): TSD 2003, LNAI 2807, p. 1, 2003. c © Springer-Verlag Berlin Heidelberg 2003 Toward Robust Speech Recognition and Understanding

[1]  Teuvo Kohonen,et al.  Exploration of very large databases by self-organizing maps , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .