Combining Prototype Selection with Local Boosting

Real life classification problems require an investigation of relationships between features in heterogeneous data sets, where different predictive models can be more proper for different regions of the data set. A solution to this problem is the application of the local boosting of weak classifiers ensemble method. A main drawback of this approach is the time that is required at the prediction of an unseen instance as well as the decrease of the classification accuracy in the presence of noise in the local regions. In this research work, an improved version of the local boosting of weak classifiers, which incorporates prototype selection, is presented. Experimental results on several benchmark real-world data sets show that the proposed method significantly outperforms the local boosting of weak classifiers in terms of predictive accuracy and the time that is needed to build a local model and classify a test instance.

[1]  Nicolás García-Pedrajas,et al.  Boosting instance selection algorithms , 2014, Knowl. Based Syst..

[2]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Dan Ventura,et al.  A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric , 2012, Pattern Recognit. Lett..

[5]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[6]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[7]  Enrico Blanzieri,et al.  Noise reduction for instance-based learning with a local maximal margin approach , 2010, Journal of Intelligent Information Systems.

[8]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[9]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[12]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[13]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[14]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  H. Finner On a Monotonicity Problem in Step-Down Multiple Test Procedures , 1993 .

[17]  Eugene M. Kleinberg A Mathematically Rigorous Foundation for Supervised Learning , 2000, Multiple Classifier Systems.

[18]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[19]  Chun-Xia Zhang,et al.  A local boosting algorithm for solving classification problems , 2008, Comput. Stat. Data Anal..

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[22]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[23]  Sotiris B. Kotsiantis,et al.  Local Boosting of Decision Stumps for Regression and Classification Problems , 2006, J. Comput..

[24]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[25]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.