Locality sensitive hashing for scalable structural classification and clustering of web documents
暂无分享,去创建一个
[1] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .
[2] Thomas Gottron,et al. Clustering Template Based Web Documents , 2008, ECIR.
[3] Benno Stein. Principles of hash-based text retrieval , 2007, SIGIR.
[4] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.
[5] Thomas Gottron,et al. DETECTING WEBSITE REDESIGNS VIA TEMPLATE SIMILARITY ON STREAMS OF DOCUMENTS , 2009 .
[6] Terry A. Welch,et al. A Technique for High-Performance Data Compression , 1984, Computer.
[7] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[8] Lei Shi,et al. A DOM Tree Alignment Model for Mining Parallel Data from the Web , 2006, ACL.
[9] David Buttler,et al. A Short Survey of Document Structure Similarity Algorithms , 2004, International Conference on Internet Computing.
[10] Isabel F. Cruz,et al. Measuring Structural Similarity Among Web Documents: Preliminary Results , 1998, EP.
[11] Deepayan Chakrabarti,et al. Page-level template detection via isotonic smoothing , 2007, WWW '07.
[12] Sachindra Joshi,et al. A bag of paths model for measuring structural similarity in Web documents , 2003, KDD '03.
[13] Thomas Gottron. Bridging the gap: from multi document Template Detection to single document Content Extraction , 2008, EuroIMSA 2008.
[14] Ronald L. Rivest,et al. The MD5 Message-Digest Algorithm , 1992, RFC.
[15] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[16] Juliana Freire,et al. On Finding Templates on Web Collections , 2009, World Wide Web.
[17] Lorenzo Blanco,et al. Highly efficient algorithms for structural clustering of large websites , 2011, WWW.
[18] Andreas Paepcke,et al. SpotSigs: robust and efficient near duplicate detection in large web collections , 2008, SIGIR '08.
[20] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.