论文信息 - Normalized Label Propagation for Imbalanced Scenario Classification

Normalized Label Propagation for Imbalanced Scenario Classification

Graph-based semi-supervised classification (GSSC) algorithms, which combine labeled data and unlabeled data with implied information to train a classifier so as to get the label of unlabeled data, have attracted a lot of attention in machine learning recently. However, the performance of classifier depends largely on training dataset we have at the very beginning. When an imbalanced sentiment appears (different classes have different numbers of labeled data in training dataset), the trained traditional semi-supervised classifiers tend to show poor performance in low-frequency classes. In this paper, we propose an effective method, called normalized label propagation algorithm (NLP), to solve the imbalance problem. With an independent assumption, NLP can balance the initial label information of different classes. Experimental results on different datasets show the better adaptability and higher classification accuracy of our proposed method.

[1] Haibo He,et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[2] Guodong Zhou,et al. Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[3] Hui Han,et al. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[4] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6] Bernhard Schölkopf,et al. Learning with Local and Global Consistency , 2003, NIPS.

[7] Ignas Budvytis,et al. Label propagation in complex video sequences using semi-supervised learning , 2010, BMVC.

[8] Feiping Nie,et al. Efficient multi-class unlabeled constrained semi-supervised SVM , 2009, CIKM.

[9] Shih-Fu Chang,et al. Label diagnosis through self tuning forweb image search , 2009, CVPR.

[10] Haibo He,et al. Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11] Chuang Yu,et al. Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data , 2013, TheScientificWorldJournal.