论文信息 - Eliminating Fuzzy Duplicates in Data Warehouses - 字舞流文

Eliminating Fuzzy Duplicates in Data Warehouses

Surajit Chaudhuri | Venkatesh Ganti | Rohit Ananthakrishna | S. Chaudhuri | R. Ananthakrishna | Venkatesh Ganti

[1] J. Leluk,et al. A new approach to sequence comparison and similarity estimation , 2004 .

[2] P. Ivax,et al. A THEORY FOR RECORD LINKAGE , 2004 .

[3] Erhard Rahm,et al. Generic Schema Matching with Cupid , 2001, VLDB.

[4] Dennis Shasha,et al. Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[5] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[6] Joseph M. Hellerstein,et al. Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[7] Sunita Sarawagi,et al. Automatic segmentation of text into structured records , 2001, SIGMOD '01.

[8] Ömer Egecioglu,et al. A new approach to sequence comparison: normalized sequence alignment , 2001, RECOMB.

[9] Yvette Salaün,et al. Information quality: meeting the needs of the consumer , 2001, Int. J. Inf. Manag..

[10] Dennis Shasha,et al. AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.

[11] Dennis Shasha,et al. An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[12] Jon M. Kleinberg,et al. Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[13] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[14] Felix Naumann,et al. Do Metadata Models meet IQ Requirements? , 1999, IQ.

[15] Johannes Gehrke,et al. CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[16] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[17] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[18] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[19] Hannu Toivonen,et al. Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[20] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.

[21] Charles Elkan,et al. An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.

[22] Charles Elkan,et al. The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[23] Heikki Mannila,et al. Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[24] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.

[25] Heikki Mannila,et al. Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[26] Heikki Mannila,et al. Approximate Dependency Inference from Relations , 1992, ICDT.

[27] W. R. Buckland. Outliers in Statistical Data , 1979 .

[28] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .

[29] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .