Learning Selective Averaged One-Dependence Estimators for Probability Estimation

Naive Bayes is a well-known effective and efficient classification algorithm, but its probability estimation performance is poor. Averaged one-dependence estimators, simply AODE, is a recently proposed semi-naive Bayes algorithm and demonstrates significantly high classification accuracy at a modest cost. In many data mining applications, however, accurate probability estimation is more desirable when making optimal decisions. Usually, probability estimation performance is measured by conditional log likelihood (CLL). In this paper, we first study the probability estimation performance of AODE and compare it to naive Bayes, tree- augumented naive Bayes, CLLTree, C4.4 (the improved version of C4.5 for better probability estimation) and Support Vector Machines. From our experiments, we find that AODE performs significantly better than the algorithms used to compare except C4.4, and performs slightly better than C4.4 although its classification accuracy is significantly better than C4.5. We then propose an efficient forward greedy feature selection algorithm for AODE and use the CLL score for attribute selection. The experimental results show that our algorithm achieves substantially improvement over AODE and significantly outperforms C4.4. Our experiments are conducted on the basis of 36 UCI data sets that cover a wide range of domains and data characteristics and we run all the algorithms within the Weka platform.

[1]  Geoffrey I. Webb,et al.  Efficient lazy elimination for averaged one-dependence estimators , 2006, ICML.

[2]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[3]  Han Liang,et al.  Learning Naïve Bayes Tree for Conditional Probability Estimation , 2006, Canadian Conference on AI.

[4]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[5]  Liangxiao Jiang,et al.  Learning Naive Bayes for Probability Estimation by Feature Selection , 2006, Canadian Conference on AI.

[7]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[8]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[9]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[10]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[11]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Ramón López de Mántaras,et al.  Robust Bayesian Linear Classifier Ensembles , 2005, ECML.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Harry Zhang,et al.  Decision Trees for Probability Estimation: An Empirical Study , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[16]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.