A novel Univariate Marginal Distribution Algorithm based discretization algorithm

Abstract Many data mining algorithms can only deal with discrete data or have a better performance on discrete data; however, for some technological reasons often we can only obtain the continuous value in the real world. Therefore, discretization has played an important role in data mining. Discretization is defined as the process of mapping the continuous attribute space into the discrete space, namely, using integer values or symbols to represent the continuous spaces. In this paper, we proposed a discretization method on the basis of a Univariate Marginal Distribution Algorithm (UMDA). The UMDA is a combination of statistics learning theory and Evolution Algorithms. The fitness function of the UMDA not only took the accuracy of the classifier into account, but also the number of breakpoints. Experimental results showed that the algorithm proposed in this paper could effectively reduce the number of breakpoints, and at the same time, improve the accuracy of the classifier.

[1]  Witold Pedrycz,et al.  Logic-based fuzzy networks: A study in system modeling with triangular norms and uninorms , 2009, Fuzzy Sets Syst..

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Chang-Hwan Lee A Hellinger-based discretization method for numeric attributes in classification learning , 2007, Knowl. Based Syst..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Liu Li-sha Attribute discretization method based on rough set theory and information entropy , 2008 .

[6]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[7]  Hao Zhang,et al.  A Modified Chi2 Algorithm Based on the Significance of Attribute , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[8]  Jinjie Huang,et al.  A GA-based approach to rough data model , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[9]  Shu-De Zhou,et al.  A Survey on Estimation of Distribution Algorithms , 2007 .

[10]  Cheong Hee Park,et al.  A SVM-based discretization method with application to associative classification , 2009, Expert Syst. Appl..

[11]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[12]  Wei-Pang Yang,et al.  A Top-Down and Greedy Method for Discretization of Continuous Attributes , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[13]  Liang Hong-xia Imp-Chi2 Algorithm for Discretization of Real Value Attributes , 2008 .

[14]  Ingoo Han,et al.  Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index , 2000 .

[15]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[16]  Huizhong Yang,et al.  Information System Continuous Attribute Discretization Based on Binary Particle Swarm Optimization , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[17]  Davy Janssens,et al.  Evaluating the performance of cost-based discretization versus entropy- and error-based discretization , 2006, Comput. Oper. Res..

[18]  Muhammad Naeem,et al.  A comparative study of heuristic algorithms: GA and UMDA in spatially multiplexed communication systems , 2010, Eng. Appl. Artif. Intell..

[19]  He Liu,et al.  An attribute discretization algorithm based on Rough Set and information entropy , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[20]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.

[21]  L. Shang,et al.  Selection and optimization of cut-points for numeric attribute values , 2009, Comput. Math. Appl..

[22]  Wei-Pang Yang,et al.  A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[23]  Keqiu Li,et al.  A Novel Chi2 Algorithm for Discretization of Continuous Attributes , 2008, APWeb.