Eliminating Fuzzy Duplicates in Data Warehouses

[1]  J. Leluk,et al.  A new approach to sequence comparison and similarity estimation , 2004 .

[2]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[3]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[4]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[5]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[6]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[7]  Sunita Sarawagi,et al.  Automatic segmentation of text into structured records , 2001, SIGMOD '01.

[8]  Ömer Egecioglu,et al.  A new approach to sequence comparison: normalized sequence alignment , 2001, RECOMB.

[9]  Yvette Salaün,et al.  Information quality: meeting the needs of the consumer , 2001, Int. J. Inf. Manag..

[10]  Dennis Shasha,et al.  AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.

[11]  Dennis Shasha,et al.  An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[12]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[13]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[14]  Felix Naumann,et al.  Do Metadata Models meet IQ Requirements? , 1999, IQ.

[15]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[16]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[17]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[18]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[19]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[20]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[21]  Charles Elkan,et al.  An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.

[22]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[23]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[24]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[25]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[26]  Heikki Mannila,et al.  Approximate Dependency Inference from Relations , 1992, ICDT.

[27]  W. R. Buckland Outliers in Statistical Data , 1979 .

[28]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[29]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .