Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

Abstract—In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof-control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.

[1]  Wen-Chih Wang,et al.  Data mining for yield enhancement in semiconductor manufacturing and an empirical study , 2007, Expert Syst. Appl..

[2]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[3]  R. H. Myers,et al.  Probability and Statistics for Engineers and Scientists , 1978 .

[4]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[5]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[6]  Yuichi Nagahara,et al.  Non‐Gaussian Filter and Smoother Based on the Pearson Distribution System , 2003 .

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  K. Pearson Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material , 1895 .

[11]  Karl Pearson,et al.  Mathematical Contributions to the Theory of Evolution. XIX. Second Supplement to a Memoir on Skew Variation , 1901 .

[12]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[13]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[14]  Sholom M. Weiss,et al.  IBM Research Report Data Analytics and Stochastic Modeling in a Semiconductor Fab , 2009 .

[15]  Karl Pearson,et al.  Mathematical contributions to the theory of evolution.—X. Supplement to a memoir on skew variation , 1901, Proceedings of the Royal Society of London.

[16]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[17]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[18]  N. Henze A Probabilistic Representation of the 'Skew-normal' Distribution , 1986 .

[19]  I. W. Burr Cumulative Frequency Functions , 1942 .

[20]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.