Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. In particular, following the same recipe for a certain IC device, multiple tools and chambers can be deployed for the production of this device, during which multiple time series can be collected, such as temperature, impedance, gas flow, electric bias, etc. These time series naturally fit into a two-dimensional array (matrix), i.e., Each element in this array corresponds to a time series for one process variable from one chamber. To leverage the rich structural information in such temporal data, in this paper, we propose a novel framework named C-Struts to simultaneously cluster on the two dimensions of this array. In this framework, we interpret the structural information as a set of constraints on the cluster membership, introduce an auxiliary probability distribution accordingly, and design an iterative algorithm to assign each time series to a certain cluster on each dimension. To the best of our knowledge, we are the first to address this problem. Extensive experiments on benchmark and manufacturing data sets demonstrate the effectiveness of the proposed method.

[1]  Hyung Jin Chang,et al.  Spatiotemporal Pattern Modeling for Fault Detection and Classification in Semiconductor Manufacturing , 2012, IEEE Transactions on Semiconductor Manufacturing.

[2]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[3]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[5]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[8]  Weihua Li,et al.  Recursive PCA for adaptive process monitoring , 1999 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[11]  Seán McLoone,et al.  A dynamic sampling methodology for within product virtual metrology , 2012 .

[12]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[13]  Tao Li,et al.  A general model for clustering binary data , 2005, KDD '05.

[14]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[15]  Jürgen Pilz,et al.  Monitoring virtual metrology reliability in a sampling decision system , 2013, 2013 IEEE International Conference on Automation Science and Engineering (CASE).

[16]  Christos Faloutsos,et al.  Parsimonious linear fingerprinting for time series , 2010, Proc. VLDB Endow..

[17]  Eamonn J. Keogh,et al.  DTW-D: time series semi-supervised learning from a single example , 2013, KDD.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  Hsiao-Ping Huang,et al.  Fault detection and isolation for dynamic processes using recursive principal component analysis (PCA) based on filtering of signals , 2007 .

[20]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[21]  Eamonn J. Keogh,et al.  MDL-based time series clustering , 2012, Knowledge and Information Systems.

[22]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[23]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[24]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[25]  Tie-Yan Liu,et al.  Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.

[26]  Jin Wang,et al.  Large-Scale Semiconductor Process Fault Detection Using a Fast Pattern Recognition-Based Method , 2010, IEEE Transactions on Semiconductor Manufacturing.

[27]  Yada Zhu,et al.  Hierarchical Modeling with Tensor Inputs , 2012, AAAI.

[28]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[29]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[30]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[31]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[32]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[33]  Christos Faloutsos,et al.  PaCK: Scalable parameter-free clustering on K-partite graphs , 2009, SDM 2009.

[34]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[35]  Li Wei,et al.  Efficiently finding unusual shapes in large image databases , 2008, Data Mining and Knowledge Discovery.

[36]  Yada Zhu,et al.  Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing , 2014, ICDM.

[37]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..