Effective Incremental Clustering for Duplicate Detection in Large Databases
暂无分享,去创建一个
[1] Mattis Neiling,et al. The Object Identification Framework , 2003 .
[2] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[3] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).
[4] Surajit Chaudhuri,et al. Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.
[5] William W. Cohen,et al. Learning to Match and Cluster Entity Names , 2001 .
[6] P. Ivax,et al. A THEORY FOR RECORD LINKAGE , 2004 .
[7] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.
[8] James C. French,et al. Clustering large datasets in arbitrary metric spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).
[9] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[10] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[11] Charles Elkan,et al. An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.
[12] Dmitri V. Kalashnikov,et al. Exploiting Relationships for Domain-Independent Data Cleaning , 2005, SDM.
[13] Eugenio Cesario,et al. An incremental clustering scheme for duplicate detection in large databases , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).
[14] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[15] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[16] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[17] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[18] Esko Ukkonen,et al. Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..
[19] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[20] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.
[21] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[22] Charles Elkan,et al. The Field Matching Problem: Algorithms and Applications , 1996, KDD.
[23] Ricardo A. Baeza-Yates,et al. Searching in metric spaces , 2001, CSUR.
[24] Dimitrios Gunopulos,et al. Efficient and tumble similar set retrieval , 2001, SIGMOD '01.
[25] William E. Winkler,et al. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .
[26] Hanan Samet,et al. Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.
[27] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.
[28] William W. Cohen,et al. Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.
[29] Rajeev Motwani,et al. Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.