BAYDA: Software for Bayesian Classification and Feature Selection

BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is well-known that the Naive Bayes classifier performs well in predictive data mining tasks, when compared to approaches using more complex models. However, the model makes strong independence assumptions that are frequently violated in practice. For this reason, the BAYDA software also provides a feature selection scheme which can be used for analyzing the problem domain, and for improving the prediction accuracy of the models constructed by BAYDA. The scheme is based on a novel Bayesian feature selection criterion introduced in this paper. The suggested criterion is inspired by the Cheeseman-Stutz approximation for computing the marginal likelihood of Bayesian networks with hidden variables. The empirical results with several widely-used data sets demonstrate that the automated Bayesian feature selection scheme can dramatically decrease the number of relevant features, and lead to substantial improvements in prediction accuracy.

[1]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[2]  Henry Tirri,et al.  Prababilistic Instance-Based Learning , 1996, ICML.

[3]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[6]  Henry Tirri,et al.  On predictive distributions and Bayesian networks , 2000, Stat. Comput..

[7]  P. Kontkanen,et al.  Comparing Bayesian Model Class Selection Criteriaby Discrete Finite , 1996 .

[8]  Henry Tirri,et al.  Bayesian and Information-Theories Priors for Bayesian Network Parameters , 1998, ECML.

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  Henry Tirri,et al.  On the Accuracy of Stochastic Complexity Approximations , 1999 .

[11]  C. J. Huberty,et al.  Applied Discriminant Analysis , 1994 .

[12]  P. Kontkanen,et al.  Comparing Bayesian Model Class Selection Criteria by Discrete Finite Mixtures , 1996 .

[13]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[14]  Henry Tirri Plausible Prediction by Bayesian Inference , 1997 .

[15]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[16]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[17]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  Tomi Silander,et al.  Comparing Predictive Inference Methods for Discrete , 1997 .

[20]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .