Deduplication in Databases using Locality Sensitive Hashing and Bloom filter
暂无分享,去创建一个
[1] Juan Enrique Ramos,et al. Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .
[2] Marvin Theimer,et al. Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.
[3] Peter Christen,et al. A Comparison of Personal Name Matching: Techniques and Practical Issues , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).
[4] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..
[5] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[6] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[7] Jian Shen,et al. Secure similarity-based cloud data deduplication in Ubiquitous city , 2017, Pervasive Mob. Comput..
[8] Ibrahim Moukouop Nguena,et al. Fast Semantic Duplicate Detection Techniques in Databases , 2017 .
[9] Kadhum Alnoory,et al. Performance Evaluation of Similarity Functions for Duplicate Record Detection , 2011 .
[10] C. Lee Giles,et al. Adaptive sorted neighborhood methods for efficient record linkage , 2007, JCDL '07.
[11] Keizo Oyama,et al. A Fast Linkage Detection Scheme for Multi-Source Information Integration , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.
[12] James Allan,et al. Using Soundex Codes for Indexing Names in ASR Documents , 2004, HLT-NAACL 2004.
[13] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[14] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[15] Carlos Alberto Heuser,et al. A fast approach for parallel deduplication on multicore processors , 2011, SAC '11.
[16] Matthew A. Jaro,et al. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .