Cost-sensitive hierarchical classification for imbalance classes

The hierarchical classification with an imbalance class problem is a challenge for in machine learning, and is caused by data with an uneven distribution. Learning from an imbalanced dataset can lead to performance degradation of the classifier. Cost-sensitive learning is a useful solution for handling the gap probability of majority and minority classes. This paper proposes a cost-sensitive hierarchical classification for imbalance classes (CSHCIC), constructing a cost-sensitive factor to balance the relationship between majority and minority classes. First, we divide a large hierarchical classification task into several small subclassification tasks by class hierarchy. Second, we establish a cost-sensitive factor by more precisely using the number of different samples of subclassifications. Then, we calculate the probability of every node using logistic regression. Lastly, we update the cost-sensitive factor using the flexibility factor and the number of samples. The experimental results show that the cost-sensitive hierarchical classification method achieves excellent performance on handling imbalance class datasets. The running time cost of the proposed method is smaller than most state-of-the-art methods.

[1]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[3]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[4]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Xing Gao,et al.  An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information , 2015, IEEE Transactions on NanoBioscience.

[6]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[7]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  Yiming Yang,et al.  Hierarchical Bayesian Inference and Recursive Regularization for Large-Scale Classification , 2015, ACM Trans. Knowl. Discov. Data.

[10]  Lijun Xie,et al.  A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data , 2018, Pattern Recognit..

[11]  Li Lin,et al.  Joint Hierarchical Category Structure Learning and Large-Scale Image Classification , 2017, IEEE Transactions on Image Processing.

[12]  Qinghua Hu,et al.  Deep super-class learning for long-tail distributed image classification , 2018, Pattern Recognit..

[13]  Fan Min,et al.  Tri-partition cost-sensitive active learning through kNN , 2017, Soft Computing.

[14]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[15]  Xizhao Wang,et al.  Preface for the special issue: soft computing in machine learning and cybernetics in the journal Soft Computing , 2006, Soft Comput..

[16]  Huijuan Lu,et al.  Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification , 2018, ICIC.

[17]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[18]  Leyi Wei,et al.  An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information. , 2015, IEEE transactions on nanobioscience.

[19]  Fei-Yue Wang,et al.  Posterior probability support vector Machines for unbalanced data , 2005, IEEE Transactions on Neural Networks.

[20]  Jianping Fan,et al.  Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection , 2015, Pattern Recognit..

[21]  Gerald Schaefer,et al.  Cost-sensitive decision tree ensembles for effective imbalanced classification , 2014, Appl. Soft Comput..

[22]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[23]  Qinghua Hu,et al.  Local Bayes Risk Minimization Based Stopping Strategy for Hierarchical Classification , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[24]  Qinghua Hu,et al.  Hierarchical feature selection with subtree based graph regularization , 2019, Knowl. Based Syst..

[25]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[26]  Sankha Subhra Mullick,et al.  Adaptive Learning-Based $k$ -Nearest Neighbor Classifiers With Resilience to Class Imbalance , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Hong Zhao,et al.  Hierarchical feature extraction based on discriminant analysis , 2019, Applied Intelligence.

[28]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[29]  Elena Baralis,et al.  Hierarchical learning for fine grained internet traffic classification , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).

[30]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[31]  Hsuan-Tien Lin,et al.  Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning , 2015, IJCAI.

[32]  Dazhe Zhao,et al.  An Optimized Cost-Sensitive SVM for Imbalanced Data Learning , 2013, PAKDD.

[33]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[34]  Q. Zou,et al.  Protein Folds Prediction with Hierarchical Structured SVM , 2016 .

[35]  Gisele L. Pappa,et al.  Top-down strategies for hierarchical classification of transposable elements with neural networks , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[36]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[37]  Wei Liu,et al.  A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data , 2016, ICONIP.

[38]  Haizhou Li,et al.  A Cost-Sensitive Deep Belief Network for Imbalanced Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[40]  P. Bork,et al.  NEAT: a domain duplicated in genes near the components of a putative Fe3+ siderophore transporter from Gram-positive pathogenic bacteria , 2002, Genome Biology.

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[43]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[44]  Qinghua Hu,et al.  A weighted rough set based method developed for class imbalance learning , 2008, Inf. Sci..