Identification of key features using topological data analysis for accurate prediction of manufacturing system outputs

Abstract Topological data analysis (TDA) has emerged as one of the most promising approaches to extract insights from high-dimensional data of varying types such as images, point clouds, and meshes, in an unsupervised manner. To the best of our knowledge, here, we provide the first successful application of TDA in the manufacturing systems domain. We apply a widely used TDA method, known as the Mapper algorithm, on two benchmark data sets for chemical process yield prediction and semiconductor wafer fault detection, respectively. The algorithm yields topological networks that capture the intrinsic clusters and connections among the clusters present in the data sets, which are difficult to detect using traditional methods. We select key process variables or features that impact the system outcomes by analyzing the network shapes. We then use predictive models to evaluate the impact of the selected features. Results show that the models achieve at least the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

[1]  P. Y. Lum,et al.  Extracting insights from the shape of complex data using topology , 2013, Scientific Reports.

[2]  Barry M. Wise,et al.  A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process , 1999 .

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Theodora Kourti,et al.  Statistical Process Control of Multivariate Processes , 1994 .

[5]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[6]  Jin Wang,et al.  Large-Scale Semiconductor Process Fault Detection Using a Fast Pattern Recognition-Based Method , 2010, IEEE Transactions on Semiconductor Manufacturing.

[7]  Taho Yang,et al.  A neural-network approach for semiconductor wafer post-sawing inspection , 2002 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[10]  Jin Wang,et al.  Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes , 2007, IEEE Transactions on Semiconductor Manufacturing.

[11]  Jiří Tlustý,et al.  Manufacturing processes and equipment , 1999 .

[12]  Bjoern Peters,et al.  CD8 T-cell reactivity to islet antigens is unique to type 1 while CD4 T-cell reactivity exists in both type 1 and type 2 diabetes. , 2014, Journal of autoimmunity.

[13]  Chenglin Wen,et al.  Fault Detection Using Random Projections and k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes , 2015, IEEE Transactions on Semiconductor Manufacturing.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Yuan Li,et al.  Diffusion maps based k-nearest-neighbor rule technique for semiconductor manufacturing process fault detection , 2014 .

[16]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[19]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[20]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.

[21]  Wei Guo,et al.  Toward automated prediction of manufacturing productivity based on feature selection using topological data analysis , 2016, 2016 IEEE International Symposium on Assembly and Manufacturing (ISAM).

[22]  Y. Benjamini Discovering the false discovery rate , 2010 .

[23]  L. Guibas,et al.  Topological methods for exploring low-density states in biomolecular folding pathways. , 2008, The Journal of chemical physics.

[24]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .