Multi-pass sorted neighborhood blocking with MapReduce
暂无分享,去创建一个
Andreas Thor | Erhard Rahm | Lars Kolb | E. Rahm | Andreas Thor | Lars Kolb
[1] Randy H. Katz,et al. Above the Clouds: A Berkeley View of Cloud Computing , 2009 .
[2] J StolfoSalvatore,et al. The merge/purge problem for large databases , 1995 .
[3] David J. DeWitt,et al. Practical Skew Handling in Parallel Joins , 1992, VLDB.
[4] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[5] Erhard Rahm,et al. Data Partitioning for Parallel Entity Matching , 2010, ArXiv.
[6] David J. DeWitt,et al. Parallel database systems: the future of high performance database systems , 1992, CACM.
[7] Randy H. Katz,et al. A view of cloud computing , 2010, CACM.
[8] Peter Christen,et al. Febrl - A Parallel Open Source Data Linkage System: http://datamining.anu.edu.au/linkage.html , 2004, PAKDD.
[9] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[10] Erhard Rahm,et al. Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..
[11] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[12] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[13] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[14] Carlo Batini,et al. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .
[15] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[16] Andreas Thor,et al. Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..
[17] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[18] Andreas Thor,et al. Parallel Sorted Neighborhood Blocking with MapReduce , 2011, BTW.
[19] Peter Christen,et al. Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.
[20] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[21] Andreas Thor,et al. Learning-Based Approaches for Matching Web Data Entities , 2010, IEEE Internet Computing.