A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
暂无分享,去创建一个
[1] George V. Moustakides,et al. Optimal Stopping: A Record-Linkage Approach , 2009, JDIQ.
[2] Jianmin Wang,et al. Effectively Indexing the Uncertain Space , 2010, IEEE Transactions on Knowledge and Data Engineering.
[3] Peter Christen,et al. Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.
[4] Jiaheng Lu,et al. Space-Constrained Gram-Based Indexing for Efficient Approximate String Search , 2009, 2009 IEEE 25th International Conference on Data Engineering.
[5] William E. Winkler,et al. Methods for evaluating and creating data quality , 2004, Inf. Syst..
[6] Dennis Shasha,et al. An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[7] Jayant Madhavan,et al. Reference reconciliation in complex information spaces , 2005, SIGMOD '05.
[8] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[9] Georgia Koutrika,et al. Entity resolution with iterative blocking , 2009, SIGMOD Conference.
[10] Weifeng Su,et al. Record Matching over Query Results from Multiple Web Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.
[11] Hans-Peter Kriegel,et al. Scalable Probabilistic Similarity Ranking in Uncertain Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.
[12] Josep-Lluís Larriba-Pey,et al. On the Use of Semantic Blocking Techniques for Data Cleansing and Integration , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).
[13] Peter Christen,et al. Preparation of name and address data for record linkage using hidden Markov models , 2002, BMC Medical Informatics Decis. Mak..
[14] Craig A. Knoblock,et al. Learning domain-independent string transformation weights for high accuracy object identification , 2002, KDD.
[15] Sugato Basu,et al. Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).
[16] Felix Naumann,et al. Industry-scale duplicate detection , 2008, Proc. VLDB Endow..
[17] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[18] Lifang Gu,et al. Decision Models for Record Linkage , 2006, Selected Papers from AusDM.
[19] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.
[20] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[21] W. Winkler. Overview of Record Linkage and Current Research Directions , 2006 .
[22] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[23] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[24] Wen-tau Yih,et al. Adaptive near-duplicate detection via similarity learning , 2010, SIGIR.
[25] M. Harada,et al. Finding authoritative people from the Web , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..
[26] Chen Li,et al. Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..
[27] Felix Naumann,et al. A Comparison and Generalization of Blocking and Windowing Algorithms for Duplicate Detection , 2009 .
[28] Andrian Marcus,et al. Data Cleansing: Beyond Integrity Analysis , 2000, IQ.
[29] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.
[30] William W. Cohen,et al. Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.
[31] Mikhail Bilenko and Raymond J. Mooney,et al. On Evaluation and Training-Set Construction for Duplicate Detection , 2003 .
[32] Peter Christen,et al. Quality and Complexity Measures for Data Linkage and Deduplication , 2007, Quality Measures in Data Mining.
[33] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.
[34] Jordi Nin Guerrero,et al. On the use of semantic blocking techniques for data cleansing and integration , 2007 .
[35] Peter Christen,et al. A Comparison of Personal Name Matching: Techniques and Practical Issues , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).
[36] Philip S. Yu,et al. The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space , 2000, KDD '00.
[37] Keizo Oyama,et al. A Fast Linkage Detection Scheme for Multi-Source Information Integration , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.
[38] Andrian Marcus,et al. Data Cleansing: Beyond Integrity Analysis 1 , 2000 .
[39] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.
[40] Divesh Srivastava,et al. Flexible String Matching Against Large Databases in Practice , 2004, VLDB.
[41] Ahmed K. Elmagarmid,et al. TAILOR: a record linkage toolbox , 2002, Proceedings 18th International Conference on Data Engineering.
[42] Noha Adly. Efficient Record Linkage using a Double Embedding Scheme , 2009, DMIN.
[43] Raymond J. Mooney,et al. Adaptive Blocking: Learning to Scale Up Record Linkage , 2006, Sixth International Conference on Data Mining (ICDM'06).
[44] Peter Christen. Towards Parameter-free Blocking for Scalable Record Linkage , 2007 .
[45] A. J. Bass,et al. Research use of linked health data — a best practice protocol , 2002, Australian and New Zealand journal of public health.
[46] Vijay S. Mookerjee,et al. Efficient Techniques for Online Record Linkage , 2011, IEEE Transactions on Knowledge and Data Engineering.
[47] D. Clark,et al. Practical introduction to record linkage for injury research , 2004, Injury Prevention.
[48] Peter Christen,et al. Towards Automated Record Linkage , 2006, AusDM.
[49] Peter Christen,et al. Automatic record linkage using seeded nearest neighbour and support vector machine classification , 2008, KDD.
[50] Xuemin Lin,et al. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints , 2008, Proc. VLDB Endow..
[51] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .
[52] Craig A. Knoblock,et al. Learning Blocking Schemes for Record Linkage , 2006, AAAI.
[53] Lise Getoor,et al. Collective entity resolution in relational data , 2007, TKDD.
[54] Jim Harper,et al. Effective Counterterrorism and the Limited Role of Predictive Data Mining , 2006 .
[55] David Hawking,et al. Similarity-aware indexing for real-time entity resolution , 2009, CIKM.
[56] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[57] Peter Christen,et al. Accurate Synthetic Generation of Realistic Personal Information , 2009, PAKDD.
[58] Sanjay Chawla,et al. Robust record linkage blocking using suffix arrays , 2009, CIKM.
[59] Christos Faloutsos,et al. FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.
[60] Sanjay Chawla,et al. Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters , 2011, TKDD.
[61] C. Lee Giles,et al. Adaptive sorted neighborhood methods for efficient record linkage , 2007, JCDL '07.
[62] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.