Hadoop Based Parallel Deduplication Method for Web Documents
暂无分享,去创建一个
[1] Wang Jian. Research and Evaluation of Near replicas of Web Pages Detection Algorithms , 2000 .
[2] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[3] Robert E. Tarjan,et al. Self-adjusting binary search trees , 1985, JACM.
[4] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[5] Daniel P. Lopresti,et al. Models and algorithms for duplicate document detection , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).
[6] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[7] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[8] Qian Song-rong. Duplicate Web Page Elimination Based on HTML and Extraction of Long Sentence , 2009 .
[9] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.
[10] Xianghua Xu,et al. Design and Implement of Distributed Document Clustering Based on MapReduce , 2009 .
[11] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[12] Edward A. Fox,et al. Research Contributions , 2014 .