Extracting discriminative shapelets from heterogeneous sensor data

We study the problem of identifying discriminative features in Big Data arising from heterogeneous sensors. We highlight the heterogeneity in sensor data from engineering applications and the challenges involved in automatically extracting only the most interesting features from large datasets. We formulate this problem as that of classification of multivariate time series and design shapelet-based algorithms for this task. We design a novel approach, called Shapelet Forests (SF), which combines shapelet extraction with feature selection. We evaluate our proposed method with other approaches for mining shapelets from multivariate time series using data from real-world engineering applications. Quantitative analysis of the experiments shows that SF performs better than the baseline approaches and achieves high classification accuracy. In addition, the method enables identification of noisy sensors from multivariate data and discounts their use for classification.

[1]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[2]  Lior Rokach,et al.  Fast Randomized Model Generation for Shapelet-Based Time Series Classification , 2012, ArXiv.

[3]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[4]  Daniel P. Siewiorek,et al.  Generalized feature extraction for structural pattern recognition in time-series data , 2001 .

[5]  Sahin Albayrak,et al.  Pattern recognition and classification for multivariate time series , 2011, SensorKDD '11.

[6]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[7]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[8]  Amy McGovern,et al.  Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction , 2010, Data Mining and Knowledge Discovery.

[9]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[10]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[11]  Eamonn J. Keogh,et al.  Classification of Multi-dimensional Streaming Time Series by Weighting Each Classifier's Track Record , 2013, 2013 IEEE 13th International Conference on Data Mining.

[12]  Juan José Rodríguez Diez,et al.  Stacking for multivariate time series classification , 2015, Pattern Analysis and Applications.

[13]  Viktor K. Prasanna,et al.  Extracting discriminative features for event-based electricity disaggregation , 2014, 2014 IEEE Conference on Technologies for Sustainability (SusTech).

[14]  Mohamed F. Ghalwash,et al.  Extraction of Interpretable Multivariate Patterns for Early Diagnostics , 2013, 2013 IEEE 13th International Conference on Data Mining.

[15]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[16]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[17]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[18]  Mohamed F. Ghalwash,et al.  Early classification of multivariate temporal observations by extraction of interpretable shapelets , 2012, BMC Bioinformatics.

[19]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Xu Chen,et al.  Early prediction on imbalanced multivariate time series , 2013, CIKM.

[21]  Dan Roth,et al.  Efficient Pattern-Based Time Series Classification on GPU , 2012, 2012 IEEE 12th International Conference on Data Mining.

[22]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[23]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[24]  Jason Lines,et al.  Alternative Quality Measures for Time Series Shapelets , 2012, IDEAL.

[25]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[26]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..