Case Study on Modeling Approaches and Framework of Scientific Data Cleaning

This paper is the case study on scientific data cleaning, it proposes some new ideas or applies some hot technologies for scientific data cleaning. There are three challenges of scientific data cleaning tool: domain knowledge representation and usage, customized cleaning flow and building dynamically. We adopt knowledge based rule modeling, workflow based flow modeling and pluggable components based cleaning framework to solve the problems. Proposed approaches have being used in a project which faces to oceanography data cleaning. Theories and practices prove that the proposed approaches and framework are contributed to build a flexible and extendable data cleaning tool.

[1]  Heiko Mueller,et al.  Problems , Methods , and Challenges in Comprehensive Data Cleansing , 2005 .

[2]  Tiziana Catarci,et al.  Using Ontologies for XML Data Cleaning , 2005, OTM Workshops.

[3]  Tok Wang Ling,et al.  IntelliClean: a knowledge-based intelligent data cleaner , 2000, KDD '00.

[4]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[5]  Zoubida Kedad,et al.  Ontology-Based Data Cleaning , 2002, NLDB.

[6]  Panos Vassiliadis,et al.  ARKTOS: A Tool For Data Cleaning and Transformation in Data Warehouse Environments , 2000, IEEE Data Eng. Bull..

[7]  Desmond D'Souza,et al.  Objects, Components, and Frameworks with UML: The Catalysis Approach , 1998 .

[8]  J DeWittDavid,et al.  Scientific data management in the coming decade , 2005 .

[9]  Ioana Manolescu,et al.  Declarative XML Data Cleaning with XClean , 2007, CAiSE.

[10]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[11]  Ahmad Abdollahzadeh Barforoush,et al.  A Flexible Fuzzy Expert System for Fuzzy Duplicate Elimination in Data Cleaning , 2004, DEXA.

[12]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[13]  Stefan Brüggemann Rule Mining for Automatic Ontology Based Data Cleaning , 2008, APWeb.

[14]  Guo Zhi-mao,et al.  Research on Data Quality and Data Cleaning: a Survey , 2002 .

[15]  Helena Galhardas Data Cleaning and Transformation Using the AJAX Framework , 2005, GTTSE.

[16]  E. J. Friedman-hill,et al.  Jess, the Java expert system shell , 1997 .

[17]  Markus Völter Pluggable Component: A Pattern for Interactive System Configuration , 1999, EuroPLoP.

[18]  V. Saravanan,et al.  A Unified Framework and Sequential Data Cleaning Approach for a Data Warehouse , 2008 .

[19]  Wil M.P. van der Aalst,et al.  Three good reasons for using a Petri-net-based workflow management system , 1996 .

[20]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[21]  Carola Eschenbach,et al.  Formal Ontology in Information Systems , 2008 .

[22]  Daling Wang,et al.  A Web-Based Transformation System for Massive Scientific Data , 2006, WISE Workshops.