A fast approach for parallel deduplication on multicore processors
暂无分享,去创建一个
[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[2] Hector Garcia-Molina,et al. D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).
[3] Sanjay Chawla,et al. Robust record linkage blocking using suffix arrays , 2009, CIKM.
[4] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[5] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[6] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[7] Peter Christen,et al. Febrl - A Parallel Open Source Data Linkage System: http://datamining.anu.edu.au/linkage.html , 2004, PAKDD.
[8] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[9] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[10] Panagiotis G. Ipeirotis,et al. Duplicate Record Detection: A Survey , 2007 .
[11] Keizo Oyama,et al. A Fast Linkage Detection Scheme for Multi-Source Information Integration , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.
[12] Ann Q. Gates,et al. TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .
[13] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[14] Wagner Meira,et al. A Scalable Parallel Deduplication Algorithm , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).
[15] Peter Christen,et al. Probabilistic Data Generation for Deduplication and Data Linkage , 2005, IDEAL.