To Select or To Weigh: A Comparative Study of Model Selection and Model Weighing for SPODE Ensembles

An ensemble of Super-Parent-One-Dependence Estimators (SPODEs) offers a powerful yet simple alternative to naive Bayes classifiers, achieving significantly higher classification accuracy at a moderate cost in classification efficiency. Currently there exist two families of methodologies that ensemble candidate SPODEs for classification. One is to select only helpful SPODEs and uniformly average their probability estimates, a type of model selection. Another is to assign a weight to each SPODE and linearly combine their probability estimates, a methodology named model weighing. This paper presents a theoretical and empirical study comparing model selection and model weighing for ensembling SPODEs. The focus is on maximizing the ensemble's classification accuracy while minimizing its computational time. A number of representative selection and weighing schemes are studied, providing a comprehensive research on this topic and identifying effective schemes that provide alternative trades-off between speed and expected error.

[1]  Eamonn J. Keogh,et al.  Learning the Structure of Augmented Bayesian Classifiers , 2002, Int. J. Artif. Intell. Tools.

[2]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[3]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[6]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[7]  Geoffrey I. Webb,et al.  Adjusted Probability Naive Bayesian Induction , 1998, Australian Joint Conference on Artificial Intelligence.

[8]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[9]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[10]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[11]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[12]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[13]  Geoffrey I. Webb,et al.  Ensemble Selection for SuperParent-One-Dependence Estimators , 2005, Australian Conference on Artificial Intelligence.

[14]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[15]  A. Meyer-Bäse Feature Selection and Extraction , 2004 .

[16]  Gregory F. Cooper,et al.  A Bayesian Method for Constructing Bayesian Belief Networks from Databases , 1991, UAI.

[17]  Geoffrey I. Webb,et al.  Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees , 1999, ICML.

[18]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[19]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[20]  Monica C. Jackson,et al.  Introduction to the Practice of Statistics , 2001 .

[21]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[22]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[23]  Stuart Aitken,et al.  Mining housekeeping genes with a Naive Bayes classifier , 2006, BMC Genomics.

[24]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[25]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[26]  Allen P. Nikora,et al.  Classifying requirements: towards a more rigorous analysis of natural-language specifications , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[27]  Michael T. Heath,et al.  Scientific Computing , 2018 .

[28]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[29]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[30]  Geoffrey I. Webb,et al.  Efficient lazy elimination for averaged one-dependence estimators , 2006, ICML.

[31]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[32]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[33]  Geoffrey I. Webb Candidate Elimination Criteria for Lazy Bayesian Rules , 2001, Australian Joint Conference on Artificial Intelligence.

[34]  Ramón López de Mántaras,et al.  Robust Bayesian Linear Classifier Ensembles , 2005, ECML.

[35]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[36]  Michael T. Heath,et al.  Scientific Computing: An Introductory Survey , 1996 .

[37]  Mong-Li Lee,et al.  SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning , 2002, PAKDD.