Heuristic discretization method for Bayesian Networks

Bayesian Network (BN) is a classification technique widely used in Artificial Intelligence. Its struct ure is a Direct Acyclic Graph (DAG) used to model the association of categorical variables. However, in cases w here the variables are numerical, a previous discretizat ion is necessary. Discretization methods are usuall y based on a statistical approach using the data distribution, such as division by quartiles. In this article we present a discretization using a heuristic that identifies ev ents called peak and valley. Genetic Algorithm was used to identify these events having the minimization of th e error between the estimated average for BN and th e actual value of the numeric variable output as the objecti ve function. The BN has been modeled from a database of Bit’s Rate of Penetration of the Brazilian pre-salt layer with 5 numerical variables and one categoric al variable, using the proposed discretization and the division of the data by the quartiles. The results show that the proposed heuristic discretization has higher accura cy than the quartiles discretization.

[1]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[2]  David Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[3]  Siti Mariyam Shamsuddin,et al.  Feature Discretization for Individuality Representation in Twins Handwritten Identification , 2011 .

[4]  David B. Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[7]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[8]  Erick Cantú-Paz,et al.  A Summary of Research on Parallel Genetic Algorithms , 1995 .

[9]  Chun-Nan Hsu,et al.  Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers , 2004, Machine Learning.

[10]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[11]  Abdullah Saleh Al-Yami,et al.  Using Bayesian Network to Model Drilling Fluids Practices in Saudi Arabia , 2012 .

[12]  Reidar Brumer Bratvold,et al.  Real Time Decision Support in Drilling Operations Using Bayesian Decision Networks , 2009 .

[13]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[14]  P. Pardalos,et al.  Handbook of global optimization , 1995 .

[15]  J. Galletly An Overview of Genetic Algorithms , 1992 .

[16]  Monica C. Jackson,et al.  Introduction to the Practice of Statistics , 2001 .

[17]  Zbigniew Michalewicz,et al.  An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms , 1991, ICGA.

[18]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[19]  Eric Michielssen,et al.  Genetic algorithm optimization applied to electromagnetics: a review , 1997 .

[20]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[21]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[22]  Alden H. Wright,et al.  Genetic Algorithms for Real Parameter Optimization , 1990, FOGA.

[23]  Martin A. Giese,et al.  Probabilistic Modeling for Decision Support in Integrated Operations , 2011 .

[24]  Rayner Alfred Discretization Numerical Data for Relational Data with One-to-Many Relations , 2009 .

[25]  Han Ding,et al.  A full-discretization method for prediction of milling stability , 2010 .