Document Clustering Algorithm Based on Word Co-occurrence

This paper presents a document clustering algorithm based on word co-occurrence to solve the problem about information deletion of text subject expression.It uses the word co-occurrence of document set to establish the document theme vector presentation model,and applies to the hierarchical clustering algorithm,through the clustering entropy to find the best level partition,and accurately reflects the relationship between documents' theme.Experimental results show that the algorithm results is better than other phrases document hierarchical clustering algorithm.