An Experimental Evaluation of Fault Diagnosis from Imbalanced and Incomplete Data for Smart Semiconductor Manufacturing

The SECOM dataset contains information about a semiconductor production line, entailing the products that failed the in-house test line and their attributes. This dataset, similar to most semiconductor manufacturing data, contains missing values, imbalanced classes, and noisy features. In this work, the challenges of this dataset are met and many different approaches for classification are evaluated to perform fault diagnosis. We present an experimental evaluation that examines 288 combinations of different approaches involving data pruning, data imputation, feature selection, and classification methods, to find the suitable approaches for this task. Furthermore, a novel data imputation approach, namely “In-painting KNN-Imputation” is introduced and is shown to outperform the common data imputation technique. The results show the capability of each classifier, feature selection method, data generation method, and data imputation technique, with a full analysis of their respective parameter optimizations.

[1]  Yuhua Li,et al.  Causality Challenge: Benchmarking relevant signal components for effective monitoring and process control , 2008, NIPS Causality: Objectives and Assessment.

[2]  Christine Guillemot,et al.  Image Inpainting : Overview and Recent Advances , 2014, IEEE Signal Processing Magazine.

[3]  Youngshin Han,et al.  Particle swarm optimization–deep belief network–based rare class prediction model for highly class imbalance problem , 2017, Concurr. Comput. Pract. Exp..

[4]  Jae Kwon Kim,et al.  Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process , 2016, BDTA.

[5]  Nittaya Kerdprasop,et al.  A Data Mining Approach to Automate Fault Detection Model Development in the Semiconductor Manufacturing Process , 2011 .

[6]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  Jian Wang,et al.  Discriminative Feature Selection Based on Imbalance SVDD for Fault Detection of Semiconductor Manufacturing Processes , 2016, J. Circuits Syst. Comput..

[9]  Balakrishnan Ramadoss,et al.  Predictive Models for Equipment Fault Detection in the Semiconductor Manufacturing Process , 2016 .

[10]  Rajkumar L. Biradar,et al.  A novel image inpainting technique based on median diffusion , 2013, Sadhana.

[11]  Dorin Moldovan,et al.  Machine learning for sensor-based manufacturing processes , 2017, 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP).

[12]  Audris Mockus,et al.  Missing Data in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Hiroshi de Silva,et al.  Missing data imputation using Evolutionary k- Nearest neighbor algorithm for gene expression data , 2016, 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[15]  S. H. Husin,et al.  Production Monitoring System for Monitoring the Industrial Shop Floor Performance , 2022 .

[16]  Ton J. Cleophas,et al.  Missing-data Imputation , 2022 .

[17]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models: Missing-data imputation , 2006 .

[18]  Youngshin Han,et al.  Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process , 2016, ITCS 2016.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Guido Guizzi,et al.  Optimization of production plan through simulation techniques , 2009 .

[21]  Youngshin Han,et al.  Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process , 2016, ITCS 2016.

[22]  Nittaya Kerdprasop,et al.  Rare Class Discovery Techniques for Highly Imbalanced Data , 2013 .