Aggregating sentence-level features for Chinese near-duplicate document detection
暂无分享,去创建一个
Shan Gao | Yan Liang | Feng Xu | Ning Feng | Xue Jiang | Yizheng Tao | Zhenjing Wan
[1] Dmitri Loguinov,et al. Probabilistic near-duplicate detection using simhash , 2011, CIKM '11.
[2] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[3] Yi Yu,et al. Rearch on Large Scale Documents Deduplication Technique based on Simhash Algorithm , 2015 .
[4] Joshua Alspector,et al. Improved robustness of signature-based near-replica detection via lexicon randomization , 2004, KDD.
[5] Maosong Sun,et al. Semi-Supervised SimHash for Efficient Document Similarity Search , 2011, ACL.
[6] Jongik Kim,et al. Efficient Exact Similarity Searches Using Multiple Token Orderings , 2012, 2012 IEEE 28th International Conference on Data Engineering.
[7] C. V. Guru Rao,et al. XNDDF: Towards a Framework for Flexible Near-Duplicate Document Detection Using Supervised and Unsupervised Learning , 2015 .
[8] Bin Wang,et al. VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.
[9] Andreas Paepcke,et al. SpotSigs: robust and efficient near duplicate detection in large web collections , 2008, SIGIR '08.
[10] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[11] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[12] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[13] Sung-Ryul Kim,et al. Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection , 2014, Int. J. Distributed Sens. Networks.
[14] Abdur Chowdhury,et al. Lexicon randomization for near-duplicate detection with I-Match , 2007, The Journal of Supercomputing.
[15] Jenq-Haur Wang,et al. Exploiting Sentence-Level Features for Near-Duplicate Document Detection , 2009, AIRS.
[16] Yang Yang,et al. Online system for detection of Chinese near-duplicate documents , 2012, 2012 6th International Conference on New Trends in Information Science, Service Science and Data Mining (ISSDM2012).
[17] Shie-Jue Lee,et al. Detecting near-duplicate documents using sentence-level features and supervised learning , 2013, Expert Syst. Appl..
[18] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[19] Shengli Wu,et al. Detecting Near-Duplicate Documents Using Sentence Level Features , 2015, DEXA.
[20] James W. Cooper,et al. A novel method for detecting similar documents , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.
[21] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.