An effective ensemble pruning algorithm based on frequent patterns

Ensemble pruning is crucial for the consideration of both predictive accuracy and predictive efficiency. Previous ensemble methods demand vast memory spaces and heavy computational burdens in dealing with large-scale datasets, which leads to the inefficiency for the problem of classification. To address the issue, this paper proposes a novel ensemble pruning algorithm based on the mining of frequent patterns called EP-FP. The method maps the dataset and pruned ensemble to a transactional database in which each transaction corresponds to an instance and each item corresponds to a base classifier. Moreover, a Boolean matrix called as the classification matrix is used to compress the classification resulted by pruned ensemble on the dataset. Henceforth, we transform the problem of ensemble pruning to the mining of frequent base classifiers on the classification matrix. Several candidate ensembles are obtained through extracting base classifiers with better performance iteratively and incrementally. Finally, we determine the final ensemble according to a designed evaluation function. The comparative experiments have demonstrated the effectiveness and validity of EP-FP algorithm for the classification of large-scale datasets.

[1]  Nicolás García-Pedrajas,et al.  Random feature weights for decision tree ensemble construction , 2012, Inf. Fusion.

[2]  Xindong Wu,et al.  Ensemble pruning via individual contribution ordering , 2010, KDD.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Hongshik Ahn,et al.  A weight-adjusted voting algorithm for ensembles of classifiers , 2011 .

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..

[8]  Xu Ming Categorization and Comparison of the Ensemble Pruning Algorithms , 2012 .

[9]  Jun Gao,et al.  A survey of neural network ensembles , 2005, 2005 International Conference on Neural Networks and Brain.

[10]  Qiang-Li Zhao,et al.  A Fast Ensemble Pruning Algorithm Based on Pattern Mining Process , 2009, ECML/PKDD.

[11]  Pádraig Cunningham,et al.  Stability problems with artificial neural networks and the ensemble solution , 2000, Artif. Intell. Medicine.

[12]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[13]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[14]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[15]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[16]  Tsuhan Chen,et al.  Pose invariant face recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[17]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[18]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[19]  S. Karthikeyan,et al.  An ensemble design of intrusion detection system for handling uncertainty using Neutrosophic Logic Classifier , 2012, Knowl. Based Syst..

[20]  Qinghua Hu,et al.  EROS: Ensemble rough subspaces , 2007, Pattern Recognit..

[21]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.

[22]  Yu-Bin Yang,et al.  Lung cancer cell identification based on artificial neural network ensembles , 2002, Artif. Intell. Medicine.