论文信息 - Classifying imbalanced data using an Svm ensemble with k-means clustering in semiconductor test process

Classifying imbalanced data using an Svm ensemble with k-means clustering in semiconductor test process

In the semiconductor manufacturing process, it is important to predict defective chips in advance for reduction of test cost and early stabilization of the production process. However, highly imbalanced datasets in the semiconductor test process degrade the performance of prediction. In order to enhance an SVM Ensemble, this study presents an improved methodology using the K-means, which clusters the majority class and the minority class before training an SVM. A result of the experiment with the actual data of the semiconductor test process is reported to demonstrate that our approach outperforms other methods in terms of classifying the imbalanced dataset.

Jee-Hyong Lee | Eun-Mi Park

[1] Isak Gath,et al. Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Joonkyu Kang,et al. A study of the DRAM industry , 2010 .

[3] Chao Chen,et al. Clustering-based binary-class classification for imbalanced data sets , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5] Gustavo E. A. P. A. Batista,et al. A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[6] Yuxin Peng,et al. AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets , 2010, MIR '10.

[7] Edward Y. Chang,et al. Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[8] Nitesh V. Chawla,et al. SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[9] Zhi-Hua Zhou,et al. Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.