An Effective Under-Sampling Method for Class Imbalance Data Problem
暂无分享,去创建一个
Data in real world tasks are usually imbalanced, i.e. some classes have much more instances than others. It is one of the reasons that cause the decrease of generalization ability of machine learning algorithms. Therefore, in this paper, we handle the class imbalance problem and proposes an under-sampling method based on SOM (Self Organizing Map), one of neural networks. Using the method above, we can obtain training data that have four characteristics such as a large decrease in the number of major category data, reduction of calculation time, solution of the memory shortage problem and acquisition of high quality data representing the major category. We apply our methods to DARPA intrusion detection data sets that have a class imbalance problem with four different machine learning algorithms. Finally we show the improvements of new sampling method compared with other sampling methods.