Learning Bayesian Networks Using Feature Selection

This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes.

[1]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[2]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[3]  Russell G. Almond,et al.  Strategies for Graphical Model Selection , 1994 .

[4]  Gregory F. Cooper,et al.  An Entropy-driven System for Construction of Probabilistic Expert Systems from Databases , 1990, UAI.

[5]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[6]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[7]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[8]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[9]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[10]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[11]  Kristian G. Olesen,et al.  HUGIN - A Shell for Building Bayesian Belief Universes for Expert Systems , 1989, IJCAI.

[12]  Lei Xu,et al.  Best first strategy for feature selection , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[13]  Edward H. Herskovits,et al.  Computer-based probabilistic-network construction , 1992 .

[14]  Kristian G. Olesen,et al.  HUGIN - a Shell for Building Belief Universes for Expert Systems , 1989, IJCAI 1989.

[15]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[16]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[17]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[18]  P. Cheeseman,et al.  Selecting Models from Data: AI and Statistics IV , 1994 .

[19]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[20]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[21]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[22]  Moninder Singh,et al.  Construction of Bayesian network structures from data: A brief survey and an efficient algorithm , 1995, Int. J. Approx. Reason..

[23]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[24]  Bo Thiesson,et al.  Selecting Models from Data : AI and statistics IV , 1995 .