Improving Accuracy and Coverage of Data Mining Systems that are Built from Noisy Datasets: A New Model

Problem statement: Noise within datasets has to be dealt with under m ost circumstances. This noise includes misclassified data or informati on as well as missing data or information. Simple human error is considered as misclassification. The se errors will decrease the accuracy of the data mining system so it will not be likely to be used. The objective was to propose an effective algorithm to deal with noise which is represented by missing data in datasets. Approach: A model for improving the accuracy and coverage of data mining systems was proposed and the algorithm of this model was constructed. The algorithm was dealing with missing values in datasets. It splits the original dataset into two new datasets; one contains tuples that hav e no missing values and the other one contains tuples that have missing values. The proposed algor ithm was applied to each of the two new datasets. It finds the reduct of each of them and then it mer ges the new reducts into one new dataset which will be ready for training. Results: The results showed interesting as it increases the accuracy and coverage of the tested dataset compared to the traditional m odels. Conclusion: The proposed algorithm performs effectively and generates better results t han the previous ones.

[1]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[2]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[3]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[4]  A. P. White,et al.  Probabilistic induction by dynamic part generation in virtual trees , 1987 .

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  B. Read,et al.  Data Mining and Science? , 2000 .

[7]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[8]  Zyad Shaaban,et al.  Data Mining: A Preprocessing Engine , 2006 .

[9]  Nittaya Kerdprasop,et al.  Comparative Study of Techniques to Handle Missing Values in the Classification Task of Data Mining นิตยา เกิดประสพ , กิตติศักดิ์ เกิดประสพ , ยอด สายแวว , ปรีชา พุมรุงเรือง , 2003 .

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  Jeffrey W. Seifert,et al.  Data Mining: An Overview , 2004 .

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Max Bramer,et al.  Techniques for Dealing with Missing Values in Classification , 1997, IDA.

[16]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.