Imp-Chi2 Algorithm for Discretization of Real Value Attributes

Discretization is an effective technique to deal with continuous attributes for machine learning and data mining.Reasonability of a discretization process is determined by the accuracy of expression and extraction for informations.By analyzing a series of Chi2 algorithm,a new algorithm called Imp-Chi2 algorithm is proposed,which is based on attribute significance.The algorithm reasonably adjusts the sequence of disretization for attributes according to the level of attribute significance,and exactly discretes the real value attributes.The experiments are performed respectively with the results of discreted data by using C4.5 and SVM.In the process of the experiments,a selection method of training set according to class proportion is presented.The method overcomes the bad-distributed situation for random selection of training set.Experimental results show that the presented algorithm is effective.