A Private Cloud Document Management System with Document Clustering Algorithm

Recently, more and more enterprises use virtualization technology and cloud computing technology to improve the level of information management. Private cloud document management system from the lab to practical application. We launched a private cloud file management system is characterized by the automatic cluster of files, so as to achieve the automated management of the text block. Document clustering has been extensively studied, because it is an effective solution, the organization of a large number of files. In order to overcome the main challenges that the current document clustering a huge number of documents, high dimensional process and comprehensible cluster, we propose a hybrid algorithm based on the top-k frequent itemsets and K-Means. The experimental results show the efficiency and effectiveness of the algorithm is superior to the other two representative clustering algorithm on two public data sets. Our algorithm can be further improved in the future parallel implementation, based on semantic representation and similarity measurement. Keywordsprivate cloud; document management; document clustering; frequent term sets