COMBINING ADABOOST WITH PREPROCESSING ALGORITHMS FOR EXTRACTING FUZZY RULES FROM LOW QUALITY DATA IN POSSIBLY IMBALANCED PROBLEMS

An extension of the Adaboost algorithm for obtaining fuzzy rule-based systems from low quality data is combined with preprocessing algorithms for equalizing imbalanced datasets. With the help of synthetic and real-world problems, it is shown that the performance of the Adaboost algorithm is degraded in presence of a moderate uncertainty in either the input or the output values. It is also established that a preprocessing stage improves the accuracy of the classifier in a wide range of binary classification problems, including those whose imbalance ratio is uncertain.

[1]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[2]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[3]  Ralf Körner An asymptotic α-test for the expectation of random fuzzy variables , 2000 .

[4]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[5]  Hisao Ishibuchi,et al.  Voting in fuzzy rule-based systems for pattern classification problems , 1999, Fuzzy Sets Syst..

[6]  Ana Colubi,et al.  Testing 'Two-Sided' Hypothesis about the Mean of an Interval-Valued Random Set , 2008, SMPS.

[7]  Inés Couso,et al.  Defuzzification of Fuzzy p-Values , 2008, SMPS.

[8]  Francisco Herrera,et al.  Genetic fuzzy systems: taxonomy, current research trends and prospects , 2008, Evol. Intell..

[9]  Inés Couso,et al.  Mark-recapture techniques in statistical tests for imprecise data , 2011, Int. J. Approx. Reason..

[10]  Russel Pears,et al.  Synthetic Minority Over-sampling TEchnique (SMOTE) for Predicting Software Build Outcomes , 2014, SEKE.

[11]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[12]  Inés Couso,et al.  Linguistic cost-sensitive learning of genetic fuzzy classifiers for imprecise data , 2011, Int. J. Approx. Reason..

[13]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[14]  Inés Couso,et al.  Boosting of Fuzzy Rules with Low Quality Data , 2012, J. Multiple Valued Log. Soft Comput..

[15]  Inés Couso,et al.  Diagnosis of dyslexia with low quality data with genetic fuzzy systems , 2010, Int. J. Approx. Reason..

[16]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[17]  Robert E. Schapire,et al.  Theoretical Views of Boosting and Applications , 1999, ALT.

[18]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[19]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[20]  Inés Couso,et al.  Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers , 2012, Int. J. Comput. Intell. Syst..

[21]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[22]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[23]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[24]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[25]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[26]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .