Data Quality Mining using Genetic Algorithm

Data quality mining (DQM) is a new and promising data mining approach from the academic and the business point of view. Data quality is important to organizations. People use information attributes as a tool for assessing data quality. The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for many applications of knowledge discovery in databases (KDD). In this work, we have considered four data qualities like accuracy, comprehensibility, interestingness and completeness. We have tried to develop Multi-objective Genetic Algorithm (GA) based approach utilizing linkage between feature selection and association rule. The main motivation for using GA in the discovery of high-level prediction rules is that they perform a global search and cope better with attribute interaction that the greedy rule induction algorithms often used in data mining.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  Gerson Zaverucha,et al.  Genetic Based Machine Learning: Merging Pittsburgh and Michigan, an Implicit Feature Selection Mechanism and a New Crossover Operator , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[3]  Erick Cantú-Paz,et al.  Feature Subset Selection, Class Separability, and Genetic Algorithms , 2004, GECCO.

[4]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[5]  Sushant Sachdeva,et al.  Dimension Reduction , 2008, Encyclopedia of GIS.

[6]  Ulrich Güntzer,et al.  Data Quality Mining - Making a Virute of Necessity , 2001, DMKD.

[7]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  Cheng-Hong Yang,et al.  Dimensionality Reduction using GA-PSO , 2006, JCIS.