Efficient Approach for Near Duplicate Document Detection Using Textual and Conceptual Based Techniques
暂无分享,去创建一个
[1] Wei Wang,et al. Near Duplicate Text Detection Using Frequency-Biased Signatures , 2013, WISE.
[2] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[3] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[4] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .
[5] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[6] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[7] Josef Stoer,et al. Numerische Mathematik 1 , 1989 .
[8] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.
[9] Ángel F. Zazo Rodríguez,et al. Web Document Duplicate Detection Using Fuzzy Hashing , 2011, PAAMS.
[10] Gene H. Golub,et al. Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.
[11] H. Bast,et al. Fast error-tolerant search on very large texts , 2009, SAC '09.
[12] Andreas Paepcke,et al. SpotSigs: robust and efficient near duplicate detection in large web collections , 2008, SIGIR '08.
[13] Feng Zhang,et al. Research on New Algorithm of Topic-Oriented Crawler and Duplicated Web Pages Detection , 2012, ICIC.