A Survey of Document Clustering

As an unsupervised machine learning method,document clustering has been widely used in many NLP applications such as information retrieval,automatic multi-document summarization and etc.In this paper the background and the architecture of document clustering is discussed firstly,and then some related problems are surveyed which includes clustering algorithm,feature space construction,dimension reduction and the semantic problem.In the end this paper introduces the evaluation of cluster quality.