User-defined Redundancy in Web Archives
暂无分享,去创建一个
[1] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[2] Jimmy J. Lin,et al. Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.
[3] Jimmy J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce , 2009, SIGIR.
[4] Daniel Gomes,et al. Managing duplicates in a web archive , 2006, SAC.
[5] Kjetil Nørvåg. Granularity reduction in temporal document databases , 2006, Inf. Syst..
[6] Felix Naumann,et al. An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.
[7] Thomas Seidl,et al. CC-MR - Finding Connected Components in Huge Graphs with MapReduce , 2012, ECML/PKDD.
[8] Éva Tardos,et al. Algorithm design , 2005 .
[9] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[10] Andreas Paepcke,et al. SpotSigs: robust and efficient near duplicate detection in large web collections , 2008, SIGIR '08.
[11] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.
[12] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).