论文信息 - Succinctly summarizing machine usage via multi-subspace clustering of multi-sensor data

Succinctly summarizing machine usage via multi-subspace clustering of multi-sensor data

Modern industrial equipments of all kinds are instrumented with a large number of sensors that continuously transmit their readings wirelessly, giving rise to what is often referred to as the `industrial internet'. Such data are often explored by engineers to determine the different usage patterns and behavior of similar machines. In this paper we describe a technique to automatically summarize the usage and behavioral patterns of a collection of similar machines by a small set of rules that nevertheless cover a large fraction of the observed data. We characterize the usage and behavior of a machine over a day, by a collection of single-sensor histograms; thus each day is a point in a high-dimensional space. We first cluster days according to each sensor separately and then combine the clusters using communities in a specially constructed graph that considers common days within clusters of different sensors. In the process some clusters of a single sensor get merged. Finally, we discover rules, each comprising of memberships in clusters of possibly different sensors. Thus, we use the term multi-subspace clustering to describe such a collection of cluster-based rules. Last but not the least, we attempt to cover a large fraction of observed days with a small number of such rules. We present empirical results on voluminous (100s of GBs) real-life sensor data and also compare our technique with related work in subspace clustering and histogram summarization.

[1] M. Newman,et al. Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[3] Shao-Yi Chien,et al. Fast image segmentation based on K-Means clustering with histograms in HSV color space , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[4] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6] Ashwin Srinivasan,et al. Exploratory Data Analysis Using Alternating Covers of Rules and Exceptions , 2014, COMAD.

[7] Rui Xu,et al. Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9] Gregory Buehrer,et al. A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[10] Joachim M. Buhmann,et al. Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[11] Donald W. Bouldin,et al. A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Frederick Reiss,et al. Compact histograms for hierarchical identifiers , 2006, VLDB.

[13] Emanuele Trucco,et al. Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[14] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[15] Dimitrios Gunopulos,et al. Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[16] Ashish Verma,et al. Cross-Guided Clustering: Transfer of Relevant Supervision across Tasks , 2012, TKDD.

[17] Santo Fortunato,et al. Community detection in graphs , 2009, ArXiv.

[18] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19] Peter C. Evans,et al. Industrial Internet: Pushing the Boundaries of Minds and Machines , 2012 .

[20] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.