CLeVer : a Feature Subset Selection Technique for Multivariate Time Series ? ( Full Version )

Feature subset selection (FSS) is one of the techniques to preprocess the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated data. We propose a novel method of FSS for Multivariate Time Series (MTS) based on Common Principal Component Analysis, termed CLeVer. Traditional FSS techniques, such as Recursive Feature Elimination (RFE) and Fisher Criterion (FC), have been applied to MTS datasets, e.g., Electro Encephalogram (EEG) datasets. However, these techniques may lose the correlation information among features, while our proposed technique utilizes the properties of the principal component analysis to retain that information. In order to evaluate the effectiveness of our selected subset of features, we employ classification as the target data mining task. Our exhaustive sets of experiments show that CLeVer outperforms RFE and FC by up to 100% in terms of classification accuracy, while requiring significantly less processing time (up to 2 orders of magnitude) than RFE and FC.

[1]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[2]  B. Flury Common Principal Components in k Groups , 1984 .

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  D. Leibovici,et al.  A singular value decomposition of a k-way array for a principal component analysis of multiway data, PTA-k , 1998 .

[5]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[6]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Huan Liu,et al.  Active Feature Selection Using Classes , 2003, PAKDD.

[12]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[13]  Cyrus Shahabi,et al.  AIMS: An Immersidata Management System , 2003, CIDR.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.

[16]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[17]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.