A Hybrid Artificial Neural Network-Naive Bayes for solving imbalanced dataset problems in semiconductor manufacturing test process

This paper introduces a hybrid approach, namely Hybrid Artificial Neural Network-Naive Bayes classifier, for two-class imbalanced datasets classification. An imbalanced dataset in semiconductor manufacturing test process is chosen as a case study. Outputs prediction in semiconductor manufacturing is helpful for engineer to identify good/bad products earlier and to avoid the bad units from being processed. This application shows the significance of solving the problems. The proposed hybrid approach presented in this paper uses the concept that an Artificial Neural Network (ANN) provides a guidance to Naive Bayes classifier in making better decision by providing an additional input to Naive Bayes. Several experiments are conducted as comparison to the individual classifiers, which are ANN and Naive Bayes. As a result, the proposed Hybrid approach performs better than the individual classifiers and finally overcomes the imbalanced dataset problems in semiconductor manufacturing test process.

[1]  Ronald M. Summers,et al.  Hybrid committee classifier for a computerized colonic polyp detection system , 2006, SPIE Medical Imaging.

[2]  Albert Y. Zomaya,et al.  A particle swarm based hybrid system for imbalanced medical data sampling , 2009, BMC Genomics.

[3]  Marzuki Khalid,et al.  Development of a hybrid Artificial Neural Network - Naive Bayes classifier for binary classification problem of imbalanced datasets , 2011 .

[4]  Che-Chang Hsu,et al.  Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization , 2011, Expert Syst. Appl..

[5]  Jin-zhu Hu,et al.  Research on Particle Swarm Optimization with Dynamic Inertia Weight , 2009, 2009 International Conference on Management and Service Science.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[8]  Hui Han,et al.  Fuzzy-rough k-nearest neighbor algorithm for imbalanced data sets learning , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[10]  Wai Kuan Yip,et al.  Forecasting Final/Class Yield Based on Fabrication Process E-Test and Sort Data , 2007, 2007 IEEE International Conference on Automation Science and Engineering.

[11]  D.S. Anyfantis,et al.  Local cost sensitive learning for handling imbalanced data sets , 2007, 2007 Mediterranean Conference on Control & Automation.

[12]  Yuan-chin Ivan Chang,et al.  Boosting SVM Classifiers with Logistic Regression , 2003 .

[13]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[14]  Peng Li,et al.  Hybrid Kernel Machine Ensemble for Imbalanced Data Sets , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Zhong-Qiu Zhao,et al.  A novel modular neural network for imbalanced classification problems , 2009, Pattern Recognit. Lett..

[16]  Taghi M. Khoshgoftaar,et al.  Feature Selection with Imbalanced Data for Software Defect Prediction , 2009, 2009 International Conference on Machine Learning and Applications.

[17]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[18]  Charles X. Ling,et al.  Hybrid Cost-Sensitive Decision Tree , 2005, PKDD.

[19]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[20]  Paolo Soda,et al.  A Hybrid Approach Handling Imbalanced Datasets , 2009, ICIAP.

[21]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[22]  Xuesong Yan,et al.  Survey of Improving Naive Bayes for Classification , 2007, ADMA.