An Enhanced Random Linear Oracle Ensemble Method using Feature Selection Approach based on Naïve Bayes Classifier

Random Linear Oracle (RLO) ensemble replaced each classifier with two mini-ensembles, allowing base classifiers to be trained using different data set, improving the variety of trained classifiers. Naive Bayes (NB) classifier was chosen as the base classifier for this research due to its simplicity and computational inexpensive. Different feature selection algorithms are applied to RLO ensemble to investigate the effect of different sized data towards its performance. Experiments were carried out using 30 data sets from UCI repository, as well as 6 learning algorithms, namely NB classifier, RLO ensemble, RLO ensemble trained with Genetic Algorithm (GA) feature selection using accuracy of NB classifier as fitness function, RLO ensemble trained with GA feature selection using accuracy of RLO ensemble as fitness function, RLO ensemble trained with t-test feature selection, and RLO ensemble trained with Kruskal-Wallis test feature selection. The results showed that RLO ensemble could significantly improve the diversity of NB classifier in dealing with distinctively selected feature sets through its fusionselection paradigm. Consequently, feature selection algorithms could greatly benefit RLO ensemble, with properly selected number of features from filter approach, or GA natural selection from wrapper approach, it received great classification accuracy improvement, as well as growth in diversity.

[1]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[3]  Frank Y. Shih,et al.  Image Processing and Pattern Recognition: Fundamentals and Techniques , 2010 .

[4]  Kai Li,et al.  Naïve Bayes ensemble learning based on oracle selection , 2009, 2009 Chinese Control and Decision Conference.

[5]  Lipo Wang,et al.  A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data , 2008, Genom. Proteom. Bioinform..

[6]  N. Nachar The Mann ‐ Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution , 2007 .

[7]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[8]  Fabio Roli,et al.  Diversity in Classifier Ensembles: Fertile Concept or Dead End? , 2013, MCS.

[9]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Juan José Rodríguez Diez,et al.  Naïve Bayes Ensembles with a Random Oracle , 2007, MCS.

[11]  Juan José Rodríguez Diez,et al.  Random Oracles for Regression Ensembles , 2011, Ensembles in Machine Learning Applications.

[12]  Giuliano Armano,et al.  Random Prototype-based Oracle for Selection-fusion Ensembles , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[14]  Mohammad-Reza Feizi-Derakhshi,et al.  Feature selection using Forest Optimization Algorithm , 2016, Pattern Recognit..

[15]  Thomas W. MacFarland Student’s t-Test for Independent Samples , 2014 .

[16]  Gavin Brown,et al.  A Study of Random Linear Oracle Ensembles , 2009, MCS.

[17]  Yaxin Bi The impact of diversity on the accuracy of evidential classifier ensembles , 2012, Int. J. Approx. Reason..

[18]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[19]  Zbigniew Telec,et al.  Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems , 2010, KES.

[20]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  Zbigniew Telec,et al.  Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms , 2012, Int. J. Appl. Math. Comput. Sci..

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..