Research on Text Clustering Based on Dependency Treebank

Text clustering is of substantial importance to information retrieval.The method of applying the information of syntactic distribution to text clustering is presented,in order to avoid the complex clustering algorithm whileenabling the linguistic interpretation of clustering features and the results of clustering.According to the dependency Treebank,ten dependency relations are suggested with distinctive distribution between oral and written Chinese By using five of them as clustering feature,the similarity of spoken and written classes achieves 71.98% and 83.13%,respectively.The experiment result shows that the proposed method of applying dependency relations to text clustering is feasible and effective.