A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data

Abstract Semiconductor wafers are fabricated through sequential process steps. Some process steps have a strong relationship with wafer yield, and these are called critical process steps. Because wafer yield is a key performance index in wafer fabrication, the critical process steps should be carefully selected and managed. This paper proposes a systematic and data-driven approach for identifying the critical process steps. The proposed method considers troublesome properties of the data from the process steps such as imbalanced data, missing values, and random sampling. As a case study, we analyzed hypothetical operational data and confirmed that the proposed method works well.

[1]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[2]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[3]  Wan Sik Nam,et al.  반도체 제조 가상계측 공정변수를 이용한 웨이퍼 수율 예측 / A Prediction of Wafer Yield Using Product Fabrication Virtual Metrology Process Parameters in Semiconductor Manufacturing , 2015 .

[4]  Lihui Wang,et al.  Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning , 2018, Journal of Manufacturing Systems.

[5]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[6]  Hua Xu,et al.  Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[7]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[8]  Stéphane Dauzère-Pérès,et al.  Integration of scheduling and advanced process control in semiconductor manufacturing: review and outlook , 2014, 2014 IEEE International Conference on Automation Science and Engineering (CASE).

[9]  D. Bennett How can I deal with missing data in my study? , 2001, Australian and New Zealand journal of public health.

[10]  R. Ward,et al.  Application of a hybrid wavelet feature selection method in the design of a self-paced brain interface system , 2007, Journal of NeuroEngineering and Rehabilitation.

[11]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[12]  Wei Guo,et al.  Identification of key features using topological data analysis for accurate prediction of manufacturing system outputs , 2017 .

[13]  Hui-Huang Hsu,et al.  Hybrid feature selection by combining filters and wrappers , 2011, Expert Syst. Appl..

[14]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Tae-Hyung Kim,et al.  Feature selection for manufacturing process monitoring using cross-validation , 2013 .

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[19]  C. Y. Peng,et al.  Principled missing data methods for researchers , 2013, SpringerPlus.

[20]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[21]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[22]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[23]  Qiang Huang,et al.  Latent variable based key process variable identification and process monitoring for forging , 2007 .

[24]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[25]  Sungzoon Cho,et al.  Efficient Feature Selection-Based on Random Forward Search for Virtual Metrology Modeling , 2016, IEEE Transactions on Semiconductor Manufacturing.

[26]  Giuseppe De Nicolao,et al.  Multi-step virtual metrology for semiconductor manufacturing: A multilevel and regularization methods-based approach , 2015, Comput. Oper. Res..

[27]  Hyun Kang The prevention and handling of the missing data , 2013, Korean journal of anesthesiology.

[28]  Kweku-Muata Osei-Bryson,et al.  Exploration of a hybrid feature selection algorithm , 2003, J. Oper. Res. Soc..

[29]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[30]  Jesús Ariel Carrasco-Ochoa,et al.  A new hybrid filter-wrapper feature selection method for clustering based on ranking , 2016, Neurocomputing.

[31]  Nittaya Kerdprasop,et al.  Feature Selection and Boosting Techniques to Improve Fault Detection Accuracy in the Semiconductor Manufacturing Process , 2011 .

[32]  Mustapha Ouladsine,et al.  A Survey of Health Indicators and Data-Driven Prognosis in Semiconductor Manufacturing Process , 2012 .

[33]  Andrew Kusiak,et al.  Data-driven smart manufacturing , 2018, Journal of Manufacturing Systems.

[34]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[35]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[36]  Douglas C. Montgomery,et al.  A review of yield modelling techniques for semiconductor manufacturing , 2006 .