Proposing a classifier ensemble framework based on classifier selection and decision tree

Abstract One of the most important tasks in pattern, machine learning, and data mining is classification problem. Introducing a general classifier is a challenge for pattern recognition communities, which enables one to learn each problem׳s dataset. Many classifiers have been proposed to learn any problem thus far. However, many of them have their own positive and negative aspects. So they are good only for specific problems. But there is no strong solution to recognize which classifier is better or good for a specific problem. Fortunately, ensemble learning provides a good way to have a near-optimal classifying system for any problem. One of the most challenging problems in classifier ensemble is introducing a suitable ensemble of base classifiers. Every ensemble needs diversity. It means that if a group of classifiers is to be a successful ensemble, they must be diverse enough to cover their errors. Therefore, during ensemble creation, a mechanism is needed to ensure that the ensemble classifiers are diverse. Sometimes this mechanism can select/remove a subset of base classifiers with respect to maintaining the diversity of the ensemble. This paper proposes a novel method, named the Classifier Selection Based on Clustering (CSBS), for ensemble creation. To insure diversity in ensemble classifiers, this method uses the clustering of classifiers technique. Bagging is used to produce base classifiers. During ensemble creation, every type of base classifier is the same as a decision tree classier or a multilayer perceptron classifier. After producing a number of base classifiers, CSBC partitions them by using a clustering algorithm. Then CSBC produces a final ensemble by selecting one classifier from each cluster. Weighted majority vote method is used as an aggregator function. In this paper we investigate the influence of cluster number on the performance of the CSBC method; we also probe how we can select a good approximate value for cluster number in any dataset. We base our study on a large number of real datasets of UCI repository to reach a definite result.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  William F. Punch,et al.  Data weighing mechanisms for clustering ensembles , 2013, Comput. Electr. Eng..

[3]  Sajad Parvin,et al.  An Extended MKNN: Modified K-Nearest Neighbor , 2011 .

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  Hamid Parvin,et al.  Linkage learning based on differences in local optimums of building blocks with one optima , 2011 .

[6]  Hamid Parvin,et al.  An innovative combination of particle swarm optimization, learning automaton and great deluge algorithms for dynamic environments , 2011 .

[7]  Hamid Parvin,et al.  A New Imbalanced Learning and Dictions Tree Method for Breast Cancer Diagnosis , 2013 .

[8]  Hamid Parvin,et al.  Classifier Selection by Clustering , 2011, MCPR.

[9]  Fabio Roli,et al.  An approach to the automatic design of multiple classifier systems , 2001, Pattern Recognit. Lett..

[10]  Hamid Parvin,et al.  A new classifier ensemble methodology based on subspace learning , 2013, J. Exp. Theor. Artif. Intell..

[11]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[12]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[13]  Vincent Ng,et al.  Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback , 2010, J. Artif. Intell. Res..

[14]  José M. Peña,et al.  Finding Consensus Bayesian Network Structures , 2011, J. Artif. Intell. Res..

[15]  Hamid Parvin,et al.  A Classifier Ensemble of Binary Classifier Ensembles , 2013 .

[16]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[17]  Hamid Parvin,et al.  A Heuristic Scalable Classifier Ensemble of Binary Classifier Ensembles , 2012 .

[18]  Julio Gonzalo,et al.  Combining Evaluation Metrics via the Unanimous Improvement Ratio and its Application to Clustering Tasks , 2011, J. Artif. Intell. Res..

[19]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..