Normalized Label Propagation for Imbalanced Scenario Classification

Graph-based semi-supervised classification (GSSC) algorithms, which combine labeled data and unlabeled data with implied information to train a classifier so as to get the label of unlabeled data, have attracted a lot of attention in machine learning recently. However, the performance of classifier depends largely on training dataset we have at the very beginning. When an imbalanced sentiment appears (different classes have different numbers of labeled data in training dataset), the trained traditional semi-supervised classifiers tend to show poor performance in low-frequency classes. In this paper, we propose an effective method, called normalized label propagation algorithm (NLP), to solve the imbalance problem. With an independent assumption, NLP can balance the initial label information of different classes. Experimental results on different datasets show the better adaptability and higher classification accuracy of our proposed method.