论文信息 - Study on Similarity Compute and File Filtering Based on Cloud Computing Method

Study on Similarity Compute and File Filtering Based on Cloud Computing Method

Text similarity computing has been widely used in confidential document filtering to enhance the safety of an enterprise information system. And the accuracy rate and performance of the similarity computing has always been the crucial problem in the research of document filtering. With the approaching era of massive data, the traditional way of computing similarity can not meet the needs of enterprises any more, but new ideas can be put forward in cloud computing environment. Aiming to solve this problem, this paper presents an algorithm of computing the distributed similarity which is based on mutual information document in cloud computing environment. This algorithm can calculate the text similarity based on cloud computing environment, and the calculations can be used to achieve the document filtering function. We’ve lanuched some experiments in Hadoop cloud computing environment, and the results show that this algorithm is a high-performance and effective algorithm.

Yufei Wang | Bo Zhang | Yuanyuan Ma

[1] Chao Liu,et al. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[2] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.