Multi-class learning using data driven ECOC with deep search and re-balancing

Multi-class learning is an important task in Data Science. One of the ways to achieve good performance on this task is to use Error Correcting Output Codes (ECOC), which is a powerful ensemble learning method that transforms a multi-class problem into a series of binary classifiers which it uses indirectly to learn the original multi-class problem. A crucial component of ECOC is the design of the coding matrix, which determines which binary problems should be combined to achieve multi-class classification. There are two general ways of designing the coding matrix. One is rooted in information theory while the other is data driven. In this work, we investigate the data-driven approach which was previously shown to bear greater promise and propose a better search through the coding-matrix space, keeping in mind the tradeoff between efficiency and effectiveness, as well as considerations about class-imbalance issues in the underlying binary problems. After consolidating our hy! potheses with a study on artificial domains, we propose the Unsupervised Deep Search Algorithm (UDS) coupled with re-sampling, to address both concerns. Our results on real world domains show that our method outperforms traditional multi-class learning methods.

[1]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[2]  Ludmila I. Kuncheva Using diversity measures for generating error-correcting output codes in classifier ensembles , 2005, Pattern Recognit. Lett..

[3]  Sergio Escalera,et al.  Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ching Y. Suen,et al.  Unconstrained numeral pair recognition using enhanced error correcting output coding: a holistic approach , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[5]  Sergio Escalera,et al.  On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ching Y. Suen,et al.  Data-driven decomposition for multi-class classification , 2008, Pattern Recognit..

[7]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[8]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[9]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary design of multiclass support vector machines , 2007, J. Intell. Fuzzy Syst..

[10]  Terry Windeatt,et al.  Boosted ECOC ensembles for face recognition , 2003 .

[11]  Ning Jia,et al.  Decoding design based on posterior probabilities in Ternary Error-Correcting Output Codes , 2012, Pattern Recognit..

[12]  Terry Windeatt,et al.  Class-Separability Weighting and Bootstrapping in Error Correcting Output Code Ensembles , 2010, MCS.

[13]  Wolfgang Utschick,et al.  Stochastic Organization of Output Codes in Multiclass Learning Problems , 2001, Neural Computation.

[14]  Giulio Iannello,et al.  A One-per-Class reconstruction rule for class imbalance learning , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Camelia Chira,et al.  Error-Correcting Output Codes for Multi-Label Text Categorization , 2012, IIR.

[16]  Claudio Marrocco,et al.  Design of reject rules for ECOC classification systems , 2012, Pattern Recognit..

[17]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[18]  Francesco Masulli,et al.  Genetic algorithm-based neural error correcting output classifier , 2014, 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL).

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Ling Li,et al.  Multiclass boosting with repartitioning , 2006, ICML.

[21]  Nicolás García-Pedrajas,et al.  Evolving Output Codes for Multiclass Problems , 2008, IEEE Transactions on Evolutionary Computation.

[22]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jordi Vitrià,et al.  Minimal design of error-correcting output codes , 2012, Pattern Recognit. Lett..

[24]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[26]  Springer-Verlag London Limited Joint learning of error-correcting output codes and dichotomizers from data , 2012 .

[27]  Nima Hatami,et al.  Thinned-ECOC ensemble based on sequential code shrinking , 2012, Expert Syst. Appl..

[28]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[29]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..