A New Algorithm of Chinese Words Automatic Clustering

Classbased statistical language model is an effective solution to the dearth of training set. It is a tough task to automatically classify words and also improtant to design a quick algorithm with good convergence. This paper proposed a method for words clustering based on the words' context with perplexity and similarity as a measure. The algorithm combines words classification with improving the performance of classbased language model together. The algorithm is of high executing speed and good clustering performance.