Data preprocessing describes any type of processing methods performed on raw data to prepare it for another processing procedure. Commonly used as a preliminary data mining practice, data preprocessing methods transforms the data into a format that will be more easily and effectively processed for the classification algorithms. In this paper, a novel data preprocessing method is proposed and evaluated in three difficult classification data sets of the well known UCI Repository, in which various classifiers have average performance lower than 75%. The three UCI repository datasets that have been used are the Mammographic masses, Indian Liver and Contraceptive Method. The performance of our proposed data preprocessing method and Principal Component Analysis preprocessing method was evaluated using the 10-fold cross validation method assessing five classification algorithms, Nearest-neighbour classifier (IB1), C4.5 algorithm implementation (J48), Random Forest, Multilayer Perceptron and Rotation Forest, respectively. The classification results are presented and compared analytically. The results indicate that the generated features after our proposed preprocessing method implementation to the original dataset markedly improve the performance of the classification algorithms.
[1]
Dimitris Kanellopoulos,et al.
Data Preprocessing for Supervised Leaning
,
2007
.
[2]
Ian Witten,et al.
Data Mining
,
2000
.
[3]
อนิรุธ สืบสิงห์,et al.
Data Mining Practical Machine Learning Tools and Techniques
,
2014
.
[4]
Ian H. Witten,et al.
Chapter 1 – What's It All About?
,
2011
.
[5]
Ron Kohavi,et al.
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
,
1995,
IJCAI.