Algorithm-Level Approaches

Algorithm-level solutions can be seen as an alternative approach to data pre-processing methods for handling imbalanced datasets. Instead of focusing on modifying the training set in order to combat class skew, this approach aims at modifying the classifier learning procedure itself. This requires an in-depth understanding of the selected earning approach in order to identify what specific mechanism may be responsible for creating the bias towards the majority class. Algorithm-level solutions do not cause any shifts in data distributions, being more adaptable to various types of imbalanced datasets – at the cost of being specific only for a given classifier type. In this chapter we will discuss the basics of algorithm-level solutions, as well as review existing skew-insensitive modifications. To do so, the background will be introduced first in Sect. 6.1. Then, special attention will be given to four groups of methods. First, modifications of SVMs will be discussed in Sect. 6.2. Section 6.3 will focus on skew-insensitive decision trees. Variants of NN classifiers for imbalanced problems will be presented in Sect. 6.4 and skew insensitive Bayesian in Sect. 6.5. Finally, one-class classifiers will be discussed in Sect. 6.6, whereas Sect. 6.7 will conclude this chapter and will present future challenges in the field of algorithm-level solutions to class imbalance.

[1]  Honghua Dai,et al.  Parameter Estimation of One-Class SVM on Imbalance Text Classification , 2006, Canadian Conference on AI.

[2]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[3]  Xizhao Wang,et al.  FRSVMs: Fuzzy rough set based support vector machines , 2010, Fuzzy Sets Syst..

[4]  Xiuzhen Zhang,et al.  A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification , 2013, PAKDD.

[5]  Wenjian Wang,et al.  An active learning-based SVM multi-class classification model , 2015, Pattern Recognit..

[6]  María Pérez-Ortiz,et al.  Dynamically weighted evolutionary ordinal neural network for solving an imbalanced liver transplantation problem , 2017, Artif. Intell. Medicine.

[7]  Oscar Fontenla-Romero,et al.  Selecting target concept in one-class classification for handling class imbalance problem , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[8]  Michal Wozniak,et al.  Hybrid Classifiers - Methods of Data, Knowledge, and Classifier Combination , 2013, Studies in Computational Intelligence.

[9]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[10]  Joydeep Ghosh,et al.  Ensembles of $({\alpha})$-Trees for Imbalanced Classification Problems , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  David A. Cieslak,et al.  A Robust Decision Tree Algorithm for Imbalanced Data Sets , 2010, SDM.

[12]  Francisco Herrera,et al.  Weighted one-class classification for different types of minority class examples in imbalanced data , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[13]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[14]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Jerzy Stefanowski,et al.  Types of minority class examples and their influence on learning classifiers from imbalanced data , 2015, Journal of Intelligent Information Systems.

[16]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[17]  Stan Matwin,et al.  A distributed instance-weighted SVM algorithm on large-scale imbalanced datasets , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[18]  Katharina Morik,et al.  Machine learning and knowledge discovery in databases : European conference, ECML PKDD 2008, Antwerp, Belgium, September 15-19, 2008 : proceedings , 2008, PKDD 2008.

[19]  Dursun Delen,et al.  A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets , 2018, Decis. Support Syst..

[20]  Stan Matwin,et al.  Applying instance-weighted support vector machines to class imbalanced datasets , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[21]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[22]  José Salvador Sánchez,et al.  On the k-NN performance in a challenging scenario of imbalance and overlapping , 2008, Pattern Analysis and Applications.

[23]  Bartosz Krawczyk,et al.  Sentiment Classification from Multi-class Imbalanced Twitter Data Using Binarization , 2017, HAIS.

[24]  Jerzy Stefanowski,et al.  Dealing with Data Difficulty Factors While Learning from Imbalanced Data , 2016, Challenges in Computational Statistics and Data Mining.

[25]  Shigeru Katagiri,et al.  Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification , 2017, IEEE Transactions on Knowledge and Data Engineering.

[26]  George A. Tsihrintzis,et al.  A comparative study of one-class classifiers in machine learning problems with extreme class imbalance , 2014, IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications.

[27]  Rong Jin,et al.  Semisupervised SVM batch mode active learning with applications to image retrieval , 2009, TOIS.

[28]  Jakub M. Tomczak,et al.  Boosted SVM with active learning strategy for imbalanced data , 2015, Soft Comput..

[29]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[30]  Yitian Xu,et al.  Maximum Margin of Twin Spheres Support Vector Machine for Imbalanced Data Classification , 2017, IEEE Transactions on Cybernetics.

[31]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[32]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[33]  Sebastián Ventura,et al.  Weighted Data Gravitation Classification for Standard and Imbalanced Data , 2013, IEEE Transactions on Cybernetics.

[34]  Mark Stevenson,et al.  Determining the difficulty of Word Sense Disambiguation , 2014, J. Biomed. Informatics.

[35]  L. Zhuang,et al.  Parameter optimization of Kernel-based one-class classifier on imbalance text learning , 2006 .

[36]  Wei Liu,et al.  Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets , 2011, PAKDD.

[37]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[38]  Yuan-Hai Shao,et al.  An efficient weighted Lagrangian twin support vector machine for imbalanced data classification , 2014, Pattern Recognit..

[39]  Yuqun Zhang,et al.  A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification , 2016, Knowl. Based Syst..

[40]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Nikos Fakotakis,et al.  Bayesian Induction of Verb Sub-categorization Frames in Imbalanced Heterogeneous Data , 2005, J. Quant. Linguistics.

[42]  Vikram Pudi,et al.  Class Based Weighted K-Nearest Neighbor over Imbalance Dataset , 2013, PAKDD.

[43]  Fang Liu,et al.  Imbalanced Hyperspectral Image Classification Based on Maximum Margin , 2015, IEEE Geoscience and Remote Sensing Letters.

[44]  Julio López,et al.  Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification , 2018, Appl. Soft Comput..

[45]  Claudia Diamantini,et al.  Bayes Vector Quantizer for Class-Imbalance Problem , 2009, IEEE Transactions on Knowledge and Data Engineering.

[46]  Haydemar Núñez,et al.  GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems , 2014, Appl. Soft Comput..

[47]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[48]  Onur Seref,et al.  Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers , 2017, Optim. Lett..

[49]  Jun Zhou,et al.  Active learning SVM with regularization path for image classification , 2014, Multimedia Tools and Applications.

[50]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[51]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[52]  Krung Sinapiromsaran,et al.  Decision tree induction based on minority entropy for the class imbalance problem , 2017, Pattern Analysis and Applications.

[53]  Edward Y. Chang,et al.  Aligning boundary in kernel space for learning imbalanced dataset , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[54]  Fatemeh Afsari,et al.  Hesitant fuzzy decision tree approach for highly imbalanced data classification , 2017, Appl. Soft Comput..

[55]  Francisco Herrera,et al.  On the usefulness of one-class classifier ensembles for decomposition of multi-class problems , 2015, Pattern Recognit..

[56]  Albert Y. Zomaya,et al.  A Survey of Mobile Device Virtualization , 2016, ACM Comput. Surv..

[57]  Hui Han,et al.  Fuzzy-rough k-nearest neighbor algorithm for imbalanced data sets learning , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[58]  David A. Cieslak,et al.  Hellinger distance decision trees are robust and skew-insensitive , 2011, Data Mining and Knowledge Discovery.

[59]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[60]  Nathalie Japkowicz,et al.  Learning over subconcepts: Strategies for 1‐class classification , 2018, Comput. Intell..

[61]  Jerzy Stefanowski,et al.  Increasing the Interpretability of Rules Induced from Imbalanced Data by Using Bayesian Confirmation Measures , 2016, NFMCP@PKDD/ECML.

[62]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[63]  Francisco Herrera,et al.  IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification , 2015, IEEE Transactions on Fuzzy Systems.

[64]  Horace Ho-Shing Ip,et al.  Active Learning with SVM , 2009, Encyclopedia of Artificial Intelligence.

[65]  Jing Cheng,et al.  Affective detection based on an imbalanced fuzzy support vector machine , 2015, Biomed. Signal Process. Control..

[66]  Changyin Sun,et al.  Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data , 2015, Knowl. Based Syst..

[67]  Binbin Pan,et al.  A Novel Framework for Learning Geometry-Aware Kernels , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Yue Wang,et al.  Weighted support vector machine for data classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[69]  Jianjun Wang,et al.  Margin calibration in SVM class-imbalanced learning , 2009, Neurocomputing.

[70]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[71]  Julio López,et al.  Imbalanced data classification using second-order cone programming support vector machines , 2014, Pattern Recognit..

[72]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[73]  Masashi Sugiyama,et al.  Multi-parametric solution-path algorithm for instance-weighted support vector machines , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[74]  Zhe Wang,et al.  Gravitational fixed radius nearest neighbor for imbalanced problem , 2015, Knowl. Based Syst..

[75]  JuiHsi Fu,et al.  Certainty-based active learning for sampling imbalanced datasets , 2013, Neurocomputing.

[76]  Alexander Liu,et al.  Smoothing Multinomial Naïve Bayes in the Presence of Imbalance , 2011, MLDM.

[77]  Thanh-Nghi Do,et al.  A Comparison of Different Off-Centered Entropies to Deal with Class Imbalance for Decision Trees , 2008, PAKDD.

[78]  Bo Tang,et al.  A Bayesian Classification Approach Using Class-Specific Features for Text Categorization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[79]  Che-Chang Hsu,et al.  Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization , 2011, Expert Syst. Appl..

[80]  Hongyuan Zha,et al.  Entropy-based fuzzy support vector machine for imbalanced datasets , 2017, Knowl. Based Syst..

[81]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[82]  Jue Wang,et al.  A New Fuzzy Support Vector Machine Based on the Weighted Margin , 2004, Neural Processing Letters.

[83]  Bartosz Krawczyk,et al.  One-class classifiers with incremental learning and forgetting for data streams with concept drift , 2015, Soft Comput..

[84]  Bartosz Krawczyk,et al.  Clustering-based ensembles for one-class classification , 2014, Inf. Sci..

[85]  Hong-Liang Dai,et al.  Class imbalance learning via a fuzzy total margin based support vector machine , 2015, Appl. Soft Comput..

[86]  Yong Zhang,et al.  SVM classification for imbalanced data using conformal kernel transformation , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[87]  Jesus A. Gonzalez,et al.  Symbolic One-Class Learning from Imbalanced Datasets: Application in Medical Diagnosis , 2009, Int. J. Artif. Intell. Tools.

[88]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.

[89]  Gilles Cohen,et al.  One-Class Support Vector Machines with a Conformal Kernel. A Case Study in Handling Class Imbalance , 2004, SSPR/SPR.

[90]  Francisco Herrera,et al.  Designing a compact Genetic fuzzy rule-based system for one-class classification , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[91]  Francisco Herrera,et al.  Addressing imbalanced classification with instance generation techniques: IPADE-ID , 2014, Neurocomputing.

[92]  Chi-Hyuck Jun,et al.  Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification , 2017, Inf. Sci..

[93]  Nitesh V. Chawla,et al.  Building Decision Trees for the Multi-class Imbalance Problem , 2012, PAKDD.

[94]  Pedro Antonio Gutiérrez,et al.  A Study on Multi-Scale Kernel Optimisation via Centered Kernel-Target Alignment , 2016, Neural Processing Letters.

[95]  Byoung-Tak Zhang,et al.  AESNB: Active Example Selection with Naïve Bayes Classifier for Learning from Imbalanced Biomedical Data , 2009, 2009 Ninth IEEE International Conference on Bioinformatics and BioEngineering.

[96]  Jian Yang,et al.  Extended nearest neighbor chain induced instance-weights for SVMs , 2016, Pattern Recognit..

[97]  Nathalie Japkowicz,et al.  One-Class versus Binary Classification: Which and When? , 2012, 2012 11th International Conference on Machine Learning and Applications.

[98]  Tianshun Chen,et al.  Optimizing the Gaussian kernel function with the formulated kernel target alignment criterion for two-class pattern classification , 2013, Pattern Recognit..

[99]  Chris Cornelis,et al.  EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data , 2016, Neurocomputing.

[100]  José Carlos Príncipe,et al.  Nearest Neighbor Distributions for imbalanced classification , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[101]  Tony R. Martinez,et al.  An instance level analysis of data complexity , 2014, Machine Learning.

[102]  Yong Zhang,et al.  Imbalanced data classification based on scaling kernel-based support vector machine , 2014, Neural Computing and Applications.

[103]  Sebastián Maldonado,et al.  Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers , 2014, Intell. Data Anal..

[104]  John Shawe-Taylor,et al.  Refining Kernels for Regression and Uneven Classification Problems , 2003, AISTATS.