Detecting near-replicas on the Web by content and hyperlink analysis
暂无分享,去创建一个
Marco Gori | Marco Maggini | Michelangelo Diligenti | Ernesto Di Iorio | Augusto Pucci | M. Gori | Marco Maggini | M. Diligenti | A. Pucci | E. Iorio
[1] Sanjoy Dasgupta,et al. Experiments with Random Projection , 2000, UAI.
[2] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[3] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .
[4] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.
[5] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[6] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[7] Pavel Zezula,et al. Approximate similarity retrieval with M-trees , 1998, The VLDB Journal.
[8] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[9] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .
[10] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[11] Andrei Z. Broder,et al. A Comparison of Techniques to Find Mirrored Hosts on the WWW , 2000, IEEE Data Eng. Bull..