Explicitly Involving the User in a Data Cleaning Process

Data cleaning and Extract-Transform-Load processes are usually modeled as graphs of data transformations. These graphs typically involve a large number of data transformations, and must handle large amounts of data. The involvement of the users responsible for executing the corresponding programs over real data is important to tune data transformations and to manually correct data items that cannot be treated automat-

[1]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[2]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[4]  Wenfei Fan,et al.  Conditional Dependencies: A Principled Approach to Improving Data Quality , 2009, BNCOD.

[5]  Paulo Veríssimo,et al.  Handling self-citations using Google Scholar , 2009 .

[6]  Panos Vassiliadis,et al.  A generic and customizable framework for the design of ETL scenarios , 2005, Inf. Syst..

[7]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[8]  Graham A Stephen,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[9]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[10]  Paulo Carreira,et al.  One-to-many data transformations through data mappers , 2007, Data Knowl. Eng..

[11]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[12]  Benjamin C. Pierce,et al.  Relational lenses: a language for updatable views , 2006, PODS '06.

[13]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[14]  Panos Vassiliadis,et al.  Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates , 2005, DaWaK.

[15]  Jeffrey F. Naughton,et al.  Efficiently incorporating user feedback into information extraction and integration programs , 2009, SIGMOD Conference.

[16]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[17]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[18]  Emanuel Santos,et al.  An Argumentation-based Approach to Database Repair , 2010, ECAI.

[19]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[20]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[21]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[22]  Ahmed K. Elmagarmid,et al.  GDR: a system for guided data repair , 2010, SIGMOD Conference.

[23]  Alon Y. Halevy,et al.  Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[24]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[25]  Dennis Shasha,et al.  AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.