Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. In particular, following the same recipe for a certain IC device, multiple tools and chambers can be deployed for the production of this device, during which multiple time series can be collected, such as temperature, impedance, gas flow, electric bias, etc. These time series naturally fit into a two-dimensional array (matrix), i.e., Each element in this array corresponds to a time series for one process variable from one chamber. To leverage the rich structural information in such temporal data, in this paper, we propose a novel framework named C-Struts to simultaneously cluster on the two dimensions of this array. In this framework, we interpret the structural information as a set of constraints on the cluster membership, introduce an auxiliary probability distribution accordingly, and design an iterative algorithm to assign each time series to a certain cluster on each dimension. To the best of our knowledge, we are the first to address this problem. Extensive experiments on benchmark and manufacturing data sets demonstrate the effectiveness of the proposed method.

[1]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[2]  Seán McLoone,et al.  A dynamic sampling methodology for within product virtual metrology , 2012 .

[3]  Hyung Jin Chang,et al.  Spatiotemporal Pattern Modeling for Fault Detection and Classification in Semiconductor Manufacturing , 2012, IEEE Transactions on Semiconductor Manufacturing.

[4]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Eamonn J. Keogh,et al.  DTW-D: time series semi-supervised learning from a single example , 2013, KDD.

[6]  Tie-Yan Liu,et al.  Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.

[7]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[8]  Jürgen Pilz,et al.  Monitoring virtual metrology reliability in a sampling decision system , 2013, 2013 IEEE International Conference on Automation Science and Engineering (CASE).

[9]  Li Wei,et al.  Efficiently finding unusual shapes in large image databases , 2008, Data Mining and Knowledge Discovery.

[10]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[11]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[12]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[14]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[15]  Christos Faloutsos,et al.  PaCK: Scalable parameter-free clustering on K-partite graphs , 2009, SDM 2009.

[16]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[20]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.