The Research of Web Page De-duplication Based on Web Pages Reshipment Statement
暂无分享,去创建一个
[1] Chen Xiao-zhi. Algorithm of Parallelized Elimination of Duplicated Web Pages Based on Map/Reduce , 2007 .
[2] Ding Zhen-Guo,et al. Research of large-scale URL Filter Base on Bloom Filter , 2008 .
[3] Gao Kai. The Strategy on Processing Replicated Web Collections , 2006 .
[4] Li Xiao-Ming,et al. Two Effective Functions on Hashing URL , 2004 .
[5] Jin-yan Chen. Finding near replicas of Web pages based on Fourier transform: Finding near replicas of Web pages based on Fourier transform , 2008 .
[6] Daniel P. Lopresti,et al. Models and algorithms for duplicate document detection , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).
[7] Zhang Ya-ping. Finding near replicas of Web pages based on Fourier transform , 2008 .
[8] Pierre Baldi,et al. Modeling the Internet and the Web: Probabilistic Methods and Algorithms: Baldi/Probabilistic , 2002 .
[9] Pierre Baldi,et al. Modeling the Internet and the Web: Probabilistic Methods and Algorithms. By Pierre Baldi, Paolo Frasconi, Padhraic Smith, John Wiley and Sons Ltd., West Sussex, England, 2003. 285 pp ISBN 0 470 84906 1 , 2006, Inf. Process. Manag..