A hybrid approach for efficient ensembles

An ensemble of classifiers, or a systematic combination of individual classifiers, often results in better classifications in comparison to a single classifier. However, the question regarding what classifiers should be chosen for a given situation to construct an optimal ensemble has often been debated. In addition, ensembles are often computationally expensive since they require the execution of multiple classifiers for a single classification task. To address these problems, we propose a hybrid approach for selecting and combining data mining models to construct ensembles by integrating Data Envelopment Analysis and stacking. Experimental results show the efficiency and effectiveness of the proposed approach.

[1]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[2]  Michael V. Mannino,et al.  Classification algorithm sensitivity to training data with non representative attribute noise , 2009, Decis. Support Syst..

[3]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[4]  篠原 正明,et al.  William W.Cooper,Lawrence M.Seiford,Kaoru Tone 著, DATA ENVELOPMENT ANALYSIS : A Comprehensive Text with Models, Applications, References and DEA-Solver Software, Kluwer Academic Publishers, 2000年, 318頁 , 2002 .

[5]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[6]  Yuhong Yang,et al.  RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , 2002 .

[7]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[8]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[9]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Chih-Ping Wei,et al.  Effective spam filtering: A single-class learning and ensemble approach , 2008, Decis. Support Syst..

[12]  David A. Elizondo,et al.  Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks , 2008, Decis. Support Syst..

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  William Nick Street,et al.  An intelligent system for customer targeting: a data mining approach , 2004, Decis. Support Syst..

[16]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[17]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[18]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[19]  Zhiqiang Zheng,et al.  Constructing Ensembles from Data Envelopment Analysis , 2007, INFORMS J. Comput..

[20]  A. Charnes,et al.  Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis , 1984 .

[21]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[22]  W. Cooper,et al.  Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software , 1999 .

[23]  Kazuyuki Sekitani,et al.  An occurrence of multiple projections in DEA-based measurement of technical efficiency: Theoretical comparison among DEA models from desirable properties , 2009, Eur. J. Oper. Res..

[24]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[25]  John Wang,et al.  Encyclopedia of Data Warehousing and Mining , 2005 .

[26]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[27]  Rema Padman,et al.  Connectionist approaches for solver selection in constrained project scheduling , 1997, Ann. Oper. Res..

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[30]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[31]  Fu-Ren Lin,et al.  The enhancement of solving the distributed constraint satisfaction problem for cooperative supply chains using multi-agent systems , 2008, Decis. Support Syst..