Pre analysis and clustering of uncertain data from manufacturing processes

With increasing complexity of manufacturing processes, the volume of data that has to be evaluated rises accordingly. The complexity and data volume make any kind of manual data analysis infeasable. At this point, data mining techniques become interesting. The application of current techniques is of complex nature because most of the data is captured by sensor measurement tools. Therefore, every measured value contains a specific error. In this paper, we propose an erroraware extension of the density-based algorithm DBSCAN. Furthermore, we discuss some quality measures that could be utilized for further interpretations of the determined clustering results. Additionally, we introduce the concept of pre-analysis during a necessary data integration step for the proposed algorithm. With this concept, the runtime of the error-aware clustering algorithm can be optimized and the integration of data mining in the overall software landscape can be promoted further .

[1]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  M. Schkolnick,et al.  9th International Conference on Very Large Data Bases , 1983, Very Large Data Bases Conference.

[3]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[4]  Wolfgang Lehner,et al.  Two-phase clustering strategy for gene expression data sets , 2006, SAC '06.

[5]  Wolfgang Lehner,et al.  Error-Aware Density-Based Clustering of Imprecise Measurement Values , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Hans-Peter Kriegel,et al.  Hierarchical density-based clustering of uncertain data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[9]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[10]  Wolfgang Lehner,et al.  Quality of Service-Driven Stream Mining , 2007 .

[11]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[12]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Reynold Cheng,et al.  Reducing UK-Means to K-Means , 2007 .

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.