Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
暂无分享,去创建一个
[1] Joshua Alspector,et al. Improved robustness of signature-based near-replica detection via lexicon randomization , 2004, KDD.
[2] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[3] Jeffrey D. Ullman,et al. Mining of Massive Datasets: Data Mining , 2011 .
[4] Karl Rihaczek,et al. 1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.
[5] J. Prasanna Kumar,et al. Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting , 2013, Int. J. Comput. Intell. Syst..
[6] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[7] A. Govardhan,et al. Fixing the Threshold for Effective Detection of Near Duplicate Web Documents in Web Crawling , 2010, ADMA.
[8] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[9] Ceriel J. H. Jacobs,et al. Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.
[10] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[11] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[12] Sung-Ryul Kim,et al. Graph-based KNN Algorithm for Spam SMS Detection , 2013, J. Univers. Comput. Sci..
[13] Caitlin Sadowski. SimHash : Hash-based Similarity Detection , 2007 .
[14] Ian Witten,et al. Data Mining , 2000 .
[15] Eiríkur Rögnvaldsson,et al. A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) , 2008, GoTAL.
[16] James W. Cooper,et al. A novel method for detecting similar documents , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.
[17] Fathy E. Eassa,et al. Near Duplicate Document Detection Survey , 2012 .