An automatic blocking mechanism for large-scale de-duplication tasks
暂无分享,去创建一个
[1] P. Ivax,et al. A THEORY FOR RECORD LINKAGE , 2004 .
[2] Lifang Gu,et al. Record Linkage: Current Practice and Future Directions , 2003 .
[3] Yannis E. Ioannidis,et al. The History of Histograms (abridged) , 2003, VLDB.
[4] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[5] Hector Garcia-Molina,et al. D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).
[6] Craig A. Knoblock,et al. Learning Blocking Schemes for Record Linkage , 2006, AAAI.
[7] Georgia Koutrika,et al. Entity resolution with iterative blocking , 2009, SIGMOD Conference.
[8] Sudipto Guha,et al. Merging the Results of Approximate Match Operations , 2004, VLDB.
[9] Elizabeth Blakesley Lindsay,et al. The Internet Movie Database (IMDb) , 2013 .
[10] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.
[11] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[12] Matthew A. Jaro,et al. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .
[13] Chen Li,et al. Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..
[14] Divesh Srivastava,et al. Flexible String Matching Against Large Databases in Practice , 2004, VLDB.
[15] Craig A. Knoblock,et al. Learning object identification rules for information integration , 2001, Inf. Syst..
[16] Mirek Riedewald,et al. Processing theta-joins using MapReduce , 2011, SIGMOD '11.
[17] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[18] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[19] H B NEWCOMBE,et al. Automatic linkage of vital records. , 1959, Science.
[20] Lars Schmidt-Thieme,et al. Scaling Record Linkage to Non-uniform Distributed Class Sizes , 2008, PAKDD.
[21] Raymond J. Mooney,et al. Adaptive Blocking: Learning to Scale Up Record Linkage , 2006, Sixth International Conference on Data Mining (ICDM'06).
[22] Divesh Srivastava,et al. Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.
[23] Peter Christen,et al. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.
[24] G. Dantzig. Discrete-Variable Extremum Problems , 1957 .
[25] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[26] Rajeev Motwani,et al. Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).
[27] W. Winkler. Overview of Record Linkage and Current Research Directions , 2006 .
[28] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[29] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[30] Jennifer Widom,et al. Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.