Entropy-based matrix learning machine for imbalanced data sets

Adopt entropy to evaluate class certainty of a pattern.Determine the corresponding fuzzy membership based on class certainty.A combination of FSVM-CIL and MatMHKS. Imbalance problem occurs when negative class contains many more patterns than that of positive class. Since conventional Support Vector Machine (SVM) and Neural Networks (NN) have been proven not to effectively handle imbalanced data, some improved learning machines including Fuzzy SVM (FSVM) have been proposed. FSVM applies a fuzzy membership to each training pattern such that different patterns can give different contributions to the learning machine. However, how to evaluate fuzzy membership becomes the key point to FSVM. Moreover, these learning machines present disadvantages to process matrix patterns. In order to process matrix patterns and to tackle the imbalance problem, this paper proposes an entropy-based matrix learning machine for imbalanced data sets, adopting the Matrix-pattern-oriented HoKashyap learning machine with regularization learning (MatMHKS) as the base classifier. The new leaning machine is named EMatMHKS and its contributions are: (1) proposing a new entropy-based fuzzy membership evaluation approach which enhances the importance of patterns, (2) guaranteeing the importance of positive patterns and get a more flexible decision surface. Experiments on real-world imbalanced data sets validate that EMatMHKS outperforms compared learning machines.

[1]  Jacek M. Łȩski,et al.  Ho--Kashyap classifier with generalization control , 2003 .

[2]  Changming Zhu,et al.  Multiple Matrix Learning Machine with Five Aspects of Pattern Information , 2015, Knowl. Based Syst..

[3]  David A. Cieslak,et al.  A Robust Decision Tree Algorithm for Imbalanced Data Sets , 2010, SDM.

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Kazuyuki Murase,et al.  A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning , 2011, ICONIP.

[6]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[7]  David A. Cieslak,et al.  Hellinger distance decision trees are robust and skew-insensitive , 2011, Data Mining and Knowledge Discovery.

[8]  Ji Hong-bing,et al.  A Modified PSVM and its Application to Unbalanced Data Classification , 2007, Third International Conference on Natural Computation (ICNC 2007).

[9]  Ming Li,et al.  2D-LDA: A statistical linear discriminant analysis for image matrix , 2005, Pattern Recognit. Lett..

[10]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  José Salvador Sánchez,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[12]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[13]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[14]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Guiqiang Ni,et al.  One-Class Support Vector Machines Based on Matrix Patterns , 2011 .

[16]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[17]  T.M. Padmaja,et al.  Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[18]  B J Biggerstaff,et al.  Comparing diagnostic tests: a simple graphic using likelihood ratios. , 2000, Statistics in medicine.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Hong-Liang Dai,et al.  Class imbalance learning via a fuzzy total margin based support vector machine , 2015, Appl. Soft Comput..

[21]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Arif Gülten,et al.  Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms , 2011, Comput. Methods Programs Biomed..

[24]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[25]  Xiaogang Deng,et al.  Nonlinear process fault pattern recognition using statistics kernel PCA similarity factor , 2013, Neurocomputing.

[26]  Narendra Ahuja,et al.  Rank-R approximation of tensors using image-as-matrix representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[28]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[29]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[30]  Francisco Herrera,et al.  Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets , 2016, Inf. Sci..

[31]  Songcan Chen,et al.  New Least Squares Support Vector Machines Based on Matrix Patterns , 2007, Neural Processing Letters.

[32]  Daoqiang Zhang,et al.  Feature extraction approaches based on matrix pattern: MatPCA and MatFLDA , 2005, Pattern Recognit. Lett..

[33]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[34]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[35]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[36]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[37]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[38]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[39]  Songcan Chen,et al.  Matrix-pattern-oriented Ho-Kashyap classifier with regularization learning , 2007, Pattern Recognit..

[40]  Kin Keung Lai,et al.  A new fuzzy support vector machine to evaluate credit risk , 2005, IEEE Transactions on Fuzzy Systems.

[41]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[42]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[43]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[44]  Yaobin Mao,et al.  A review of boosting methods for imbalanced data classification , 2014, Pattern Analysis and Applications.

[45]  Jieping Ye,et al.  Generalized Low Rank Approximations of Matrices , 2005, Machine Learning.

[46]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[47]  Yue Guo,et al.  Oil spill detection using synthetic aperture radar images and feature selection in shape space , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[48]  David G. Stork,et al.  Pattern Classification , 1973 .

[49]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[50]  Gustavo E. A. P. A. Batista,et al.  Class imbalance revisited: a new experimental setup to assess the performance of treatment methods , 2014, Knowledge and Information Systems.

[51]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[52]  D B Matchar,et al.  Noninvasive Carotid Artery Testing: A Meta-analytic Review , 1995, Annals of Internal Medicine.