Swoosh: a generic approach to entity resolution
暂无分享,去创建一个
Jennifer Widom | Hector Garcia-Molina | Qi Su | Steven Euijong Whang | Omar Benjelloun | David Menestrina | H. Garcia-Molina | J. Widom | David Menestrina | Omar Benjelloun | Qi Su | Hector Garcia-Molina | O. Benjelloun
[1] Chen Li,et al. Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..
[2] Rajeev Motwani,et al. Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.
[3] Hector Garcia-Molina,et al. D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).
[4] Robert E. Tarjan,et al. Efficiency of a Good But Not Linear Set Union Algorithm , 1972, JACM.
[5] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[6] Lifang Gu,et al. Record Linkage: Current Practice and Future Directions , 2003 .
[7] Gunter Saake,et al. Extensible and similarity-based grouping for data integration , 2002, Proceedings 18th International Conference on Data Engineering.
[8] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[9] Lise Getoor,et al. A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.
[10] William W. Cohen. Data integration using similarity joins and a word-based information representation language , 2000, TOIS.
[11] George V. Moustakides,et al. A Bayesian decision model for cost optimal record matching , 2003, The VLDB Journal.
[12] Andrew McCallum,et al. Disambiguating Web appearances of people in a social network , 2005, WWW '05.
[13] W. Winkler. Overview of Record Linkage and Current Research Directions , 2006 .
[14] Matthew A. Jaro,et al. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .
[15] H B NEWCOMBE,et al. Automatic linkage of vital records. , 1959, Science.
[16] Amihai Motro,et al. Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.
[17] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[18] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[19] Dmitri V. Kalashnikov,et al. Exploiting Relationships for Domain-Independent Data Cleaning , 2005, SDM.
[20] W. Winkler. USING THE EM ALGORITHM FOR WEIGHT COMPUTATION IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 2000 .
[21] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[22] Avrim Blum,et al. Correlation Clustering , 2004, Machine Learning.
[23] Pedro M. Domingos. Multi-Relational Record Linkage , 2003 .
[24] Charles Elkan,et al. An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.
[25] Dennis Shasha,et al. Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.
[26] Pedro M. Domingos,et al. Object Identification with Attribute-Mediated Dependences , 2005, PKDD.
[27] Matthias Blume. Automatic Entity Disambiguation : Benefits to NER , Relation Extraction , Link Analysis , and Inference , .
[28] Raghav Kaushik,et al. Efficient exact set-similarity joins , 2006, VLDB.
[29] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.
[30] Lise Getoor,et al. Iterative record linkage for cleaning and integration , 2004, DMKD '04.
[31] Hongjun Lu,et al. Discovering and reconciling value conflicts for numerical data integration , 2001, Inf. Syst..
[32] Craig A. Knoblock,et al. Learning object identification rules for information integration , 2001, Inf. Syst..
[33] Surajit Chaudhuri,et al. Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.
[34] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .
[35] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[36] Hector Garcia-Molina,et al. Generic Entity Resolution with Data Confidences , 2006, CleanDB.
[37] Rajeev Motwani,et al. Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).
[38] Jayant Madhavan,et al. Reference reconciliation in complex information spaces , 2005, SIGMOD '05.